Avik Dey (@avikdey) Bsky

The man used to be a great educator then he decided, he would rather be a mediocre influencer.

3 days ago 0 0 0 0

Convergence isn’t the same as correctness. If multiple LLMs share largely the same training datasets and similar biases, a common answer would just be the same error repeated. Consensus only means something when sources are independent and grounded.

5 days ago 0 0 0 0

Nice.

“if employees find that AI can’t easily reproduce their work at the same level of a trained human being, and CEOs who heavily use AI find that the technology makes them more productive, doesn’t that suggest that workers can’t be replaced while CEOs could be replaced by a bot?“

6 days ago 0 0 0 0

Yeah, those will need to be stubbed or even commented for the first pass - thought some enterprising young person would be on top of that already.

1 week ago 0 0 0 0

Surprised that no one has adapted the Claude Code harness to work with OpenAI APIs yet.

1 week ago 2 0 1 0

Yeah, reviewer probably decided it was worth the risk to become part of the lore.

1 week ago 0 0 0 0

/** * The rules of thinking are lengthy and fortuitous. They require plenty of thinking * of most long duration and deep meditation for a wizard to wrap one's noggin around. * * The rules follow: * 1. A message that contains a thinking or redacted_thinking block must be part of a query whose max_thinking_length > 0 * 2. A thinking block may not be the last message in a block * 3. Thinking blocks must be preserved for the duration of an assistant trajectory (a single turn, or if that turn includes a tool_use block then also its subsequent tool_result and the following assistant message) * * Heed these rules well, young wizard. For they are the rules of thinking, and * the rules of thinking are the rules of the universe. If ye does not heed these * rules, ye will be punished with an entire day of debugging and hair pulling. */

Why are Anthropic models better at coding than OpenAI’s? Because they have harnessed the power of the grand wizard!

Seriously, how these got through any sort of review cycle and survived with modules in 1000s of lines each - does leave you wondering a bit!

raw.githubusercontent.com/ComeOnOliver...

1 week ago 1 0 1 0

R.I.P.

1 week ago 0 0 0 0

What does mandates mean in this context and how is it related to the national processing? #Hungarian #Election

1 week ago 0 0 0 0

That’s what should and would have happened except for the hyperbole that Sama put out there in the early days. The marketer that he is, he really thought they were on the cusp of AI and got caught up in the grandiosity of his own idiocy. Genpop has not really been able to breakout of that nonsense.

1 week ago 1 0 0 0

It’s not black and white, I think. Yes, there is a bubble. But, there’s also utility that translates into recoupable dollars. The balance of those two has to level out a bit more before the call can be settled.

1 week ago 1 0 1 0

I think it comes from the need to feel validated. Personally, I believe that should come from within.

1 week ago 1 0 0 0

I see the same in other high profile LLM critics. What they are critiquing is the theoretical aspects of LLMs. But, it isn’t about theory anymore, it’s now about implementing these as systems. That’s where you take the parts that work and make it robust-er - reducing the critic-able surface area.

1 week ago 2 0 1 0

That’s the part people skip. Writing code is the easy part. The hard part is making it production worthy and that takes a lot more than just writing some code.

1 week ago 0 0 0 0

Exactly.

1 week ago 0 0 0 0

No coder was involved in this industrialization of technical debt.

>> “The blessing and the curse is that now everyone inside your company becomes a coder,” Michele Catasta, the president and head of AI at the startup Replit, told the NYT.

1 week ago 2 0 2 0

Finally a pope in my lifetime, who speaks like the Pope.

1 week ago 0 0 0 0

Andrej Karpathy @karpathy •1d Judging by my ti there is a growing gap in understanding of Al capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on Al a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAl's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAl Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are "not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAl Codex / Claude Code) do so professionally in technical domains like programming, math and research. This group ol is subiect to the hishest amount of "Al Psvchosis"

Karpathy thinking that folks who are professionally evaluating LLMs are using the app to assess capabilities is a sad cope - expected better from him.

Not going to link, you know where to find it.

1 week ago 2 0 1 0

Same difference 🙃

1 week ago 1 0 0 0

From the recent targeting outcomes we have seen, it’s possible that a human was not always in the loop.

1 week ago 1 0 1 0

AI Cybersecurity After Mythos: The Jagged Frontier Why the moat is the system, not the model

“… the framing overstates how exclusive these capabilities are. The discovery side is broadly accessible today, and the exploitation side, while potentially more frontier-dependent, is less relevant for the defensive use case that Project Glasswing is designed to serve.”

aisle.com/blog/ai-cybe...

1 week ago 0 0 0 0

Don’t think it’s from a need to believe. It’s that Sulzberger and/or his buddies are invested in AI, directly or indirectly.

1 week ago 2 0 0 0

The one question I have for Anthropic:

If you don’t save any state between each set of 1,000 runs, how many sets does it take to reproduce the results?

That’s the trillion dollar question.

1 week ago 1 0 0 0

The interesting thing is that LLM’s fuzziness here becomes a feature once it’s wrapped with an execution engine and a verifier. In a generate, execute and verify loop, that fuzziness acts as cheap exploration and if the luck of probability blesses you - you may even find a vulnerability.

1 week ago 1 0 0 1

Another way of saying: The model delivers only modest incremental gains at a disproportionately high inference cost, so we can no longer afford to underwrite usage for free and near-free tier users.

1 week ago 0 0 0 0

How did Anthropic measure AI's "theoretical capabilities" in the job market? 2023 study made a lot of assumptions about future "anticipated LLM-powered software."

Good article. There's another name for this phenomenon: Hopium.

arstechnica.com/ai/2026/03/h...

2 weeks ago 0 0 0 0

Yeah, BB.

2 weeks ago 0 0 0 0

The anthropomorphism came from OpenAI’s playbook.

2 weeks ago 1 0 0 0

Yes, some of us are skeptical because we are part of the first group. Others are enthusiasts because they belong in that second group. But that is a possible outcome, it’s low probability - but possible. Then what? That would have been interesting thought experiment for that article.

2 weeks ago 1 0 0 0

Great article! Captures the scenarios of using AI after you have formed scientific judgment and using AI in place of the process that forms scientific judgment. However, there’s another variable that’s left unexplored - what if Claude keeps getting better and better, till it becomes Alice?

2 weeks ago 0 0 1 0

Posts by Avik Dey