Since early 2025, we've been studying how AI tools impact productivity among developers. Previously, we found a 20% slowdown. That finding is now outdated. Speedups now seem likely, but changes in developer behavior make our new results unreliable. Weâre working to address this.
Posts by Charles Foster
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.
The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
Update for those whoâve left the other app:
Iâm now on the policy team at Model Evaluation and Threat Research (METR). Excited to be âdoing AI policyâ full-time.
Why arenât our AI evaluations better? AFAICT a key reason is that the incentives around them are kinda bad.
In a new post, I explain how the standardized testing industry works and write about lessons it may have for the AI evals ecosystem.
open.substack.com/pub/contextw...
This is perfect in its own way
Natural minds and natural bodies are irreplaceable. Artificial minds are costless to replace. We might value artificial bodies more, since they arenât so disposable, at least in the brief period when they are still few and costly. Could be a good period to set stories in.
When we optimize automation, we sometimes optimize *hard*. Like this automated loom working away at an inhuman 1200 RPM. Wild. youtu.be/WweMNDqDYhc?...
In Vitalikâs post he mentions resolving only the highest-volume markets, which I think would address this concern even more directly, but Iâm less confident I understand that version.
I dunno! Would be fun to find out
I wouldnât say it was free, really. Like, if the creator wouldâve needed to spend $1 in subsidies on a regular market, on each market that has a 90% chance of reversion they would need to offer $10 in subsidies to compensate, or whatever.
Since the expected payouts on each market are much lower, you probably need big subsidies to compensate. And since you donât know ahead of time which markets you will resolve, you have to fund them all.
You want traders to give you cheap but calibrated estimates for all the claims. The randomization reduces the expected size of payouts theyâd receive for their bets, since each market only has a 10% chance of getting audited & resolved, but it preserves the incentive to bet their true probabilities.
Letâs take this to DMs :)
So if you create a prediction market because you want information on a question, you can think of the market subsidy as the compensation youâre paying folks for their information.
Yeah. Itâs kinda subtle. With a subsidy, youâre basically giving away money as an incentive. But you can increase liquidity without giving away money.
Youâre thinking of liquidity, which is related but not the same. Subsidy here just means committing money to increase the payouts to whoever is right.
What do you mean by âsolveâ?
You wanted information about all 100, so you subsidize markets on all of them, and traders canât tell ahead of time which ones will be resolved, so if your subsidies were big they are incentivized to trade on any/all of the markets that they have information about.
Feel like theyâve made a lot of wild statements but I donât know if anybody has collected those in one place for easy reference.
Is there a website/database out there that tracks what major AI company executives say about the future of AI?
Transformers and other parallel sequence models like Mamba are in TCâ°. That implies they can't internally map (stateâ, actionâ ... actionâ) â stateâââ
But they can map (stateâ, actionâ, stateâ, actionâ ... stateâ, actionâ) â stateâââ
Just reformulate the task!
Atticus Geiger gave a take on when sparse autoencoder (SAEs) are/arenât what you should use. I basically agree with his recommendations. youtube.com/clip/UgkxKWI...
These days, flow-based models are typically defined via (neural) differential equations, requiring numerical integration or simulation-free alternatives during training. This paper revisits autoregressive flows, using Transformer layers to define the sequence of flow transformations directly.
It isnât super clear to me what the monthly pricing will be. Like, on the one hand in a competitive market I think the price of AI services will tend downward toward the marginal cost. But also there are only a few providers and constraints on supply. Not sure how it comes out on balance.
It might be like that! If so I would expect an experiment like this to indicate that. :)
Re: instruction-tuning and RLHF as âlobotomyâ
Iâm interested in experiments that look into how much finetuning can âroll backâ a post-trained model to its base model perplexity on the original distribution.
Has anyone seen an experiment like this run?
Ah. Yeah I donât think thereâs anything special about services that brand themselves as âAI agentsâ. What matters IMO is itâs opaquely doing expensive work on behalf of the client without human oversight.
For those, I think they might want to advertise their guarantees. Not certain, though.
Can you say more? Not sure that I understand.
Iâve been wondering when it would make sense for âAI agentâ services to offer money-back guarantees. Wrote a short post about this on a flight.
open.substack.com/pub/contextw...
xkcd comic 386, with back and forth that goes: âAre you going to bed?â âI canât. This is important.â âWhat?â âSomeone is WRONG on the internet.â https://xkcd.com/386/
Neat thing about real-money prediction markets is that you can get paid for doing this.