Advertisement · 728 × 90

Posts by

I like opus 4.7, but they need to halve the adderall intake

3 days ago 0 0 0 0
Post image

increasingly sympathetic

1 week ago 356 25 25 5

though maybe it has 250k long CoT's and its basically its own ttc harness... who knows!

1 week ago 1 0 0 0

idk, I have a strong prior that better base models are just better at everything in a way that benchmarking(even across a huge range of evals) always understates, also what does this mean they can do with more ttc with the base?
seems to me the closest to a GPT-n ish level stack moar layerz vibe

1 week ago 1 0 2 0

I think its actually slightly harder, eyeballing it
Ant ECI vs Epoch ECI
Sonnet 4.5: 144 vs 147
Opus 4.5: 148 vs 150
Opus 4.6: 152 vs 155

1 week ago 1 0 1 0

I think the 5.4 pro equivalence, if you are referring to the ECI stuff, is complicated by the fact that the version of ECI they maintain internally is very different from the public version and not directly comparable

1 week ago 1 0 1 0

www.newsweek.com/joe-biden-io...

2 weeks ago 2 0 0 0

dropping a falklands-shaped nuke on the discourse with "americans are the indigenous people of the moon"

2 weeks ago 383 56 8 7
Advertisement

4096 dimensional subspace is 0.08, but the distribution is beta with tiny variance so by change there is never such a direction by chance!
I think this means that if you found this direction, its instrumentally useful to have an "idk" direction in the residual stream, which seems plausible!

1 month ago 2 0 1 1

from what I can tell, this is really hard to do for dimensionality reasons! Like if this exists the model definitely needed it, developed it intentionally.

consider the case of d-model 4096 and a 50k vocab to be conservative, the mean length of the projection of the constant vector onto a random

1 month ago 1 0 1 0

tired: OpenAI has lost the Mandate of Heaven

wired: OpenAI has gained the gates of hell

1 month ago 399 48 15 3

this fucking piece of shit

1 month ago 677 73 17 2
Post image Post image

i come bearing deleted roon tweets

1 month ago 298 17 10 3
Post image

"adam tooze comes out as a treatler"

You bolt awake in the shores of Little St James. You are not online. It is 1997 AD. You are the Treasury Secretary Larry Summers, and you have changed your mind. The future cannot come to pass. The financial industry must not be deregulated

2 months ago 453 57 16 22
that scene from challengers but the boys heads are replaced with the claude logo

that scene from challengers but the boys heads are replaced with the claude logo

3 months ago 78 3 1 0

Anthropic CEO being the only one to not put out a Manchurian candidate style post about how David Sacks is the best most kind and public spirited person ever furthering my impression of them as the least bad of those companies

4 months ago 250 26 7 1
the xkcd hangun petri dish comic except its about how while base models outperform RL'ed models on multiple choice benchmarks at high pass@N, so does a uniform distribution

the xkcd hangun petri dish comic except its about how while base models outperform RL'ed models on multiple choice benchmarks at high pass@N, so does a uniform distribution

4 months ago 0 0 0 0
Advertisement

basic someone saw this coming in like 2005 but unfortunately he started a religion about it that mostly serves to make worrying about this seem silly. also it produced marketing and seed funding for the companies that are doing it

4 months ago 186 19 10 0

"enriched" forward pass

5 months ago 0 0 0 0
Preview
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetic...

makes sense, models can totally smuggle information in the kv cache across token indices but if we suppose some intermediate computation is completely independent from the token we emit then this info can't participate in any of the more complex stuff a la arxiv.org/abs/2402.12875 - its just an

5 months ago 0 0 2 0

If I am an expert in layer 16 of 32 of a vanilla transformer and realize that my job in at some token is to compute some sum so that it can be used down the line, I can do that, then any attention head in layer 16 can deposit that info to any future token without any intermediate unembeddings right?

5 months ago 3 0 2 0
Wrong Turn on the Dragon - Numberphile
Wrong Turn on the Dragon - Numberphile YouTube video by Numberphile

I bet heath ceramics would do it www.youtube.com/watch?v=v678...

5 months ago 0 0 0 0

the meta ai video slop TikTok is I think the first tech thing where I have been 100% on the side of the Luddites (vernacular usage, I know the actual Luddites were more complex than that, nerds). Usually I think there's a little too much of that reflexively on the left tbh, but no this shit sucks.

6 months ago 683 69 14 9

it is obscene to treat the unfortunate but (as he himself pointed out) routine murder of this Nazi shithead who believed that Jewish gelt was corrupting the blood of white Americans like a national tragedy

7 months ago 835 121 3 1

Interesting, what about muon/shampoo or other spectrum-y ones?

7 months ago 0 0 1 0

modern ai is basically a bunch of rogue google employees taking google projects that were done pretty cautiously and making them less cautious

7 months ago 89 9 0 0
Advertisement

Some days the Iliad being about Helen just becomes a lot more believable.

8 months ago 2 0 0 0

sorry i just got back from dropping off my wife with Bari Weiss Da Strap God

11 months ago 229 12 16 11

bar crawl on Ceres with Amos Expanse

9 months ago 150 13 2 0