Charles Foster (@cfoster) Bsky

Since early 2025, we've been studying how AI tools impact productivity among developers. Previously, we found a 20% slowdown. That finding is now outdated. Speedups now seem likely, but changes in developer behavior make our new results unreliable. We’re working to address this.

1 month ago 23 5 3 0

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.

The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

9 months ago 6894 3011 109 623

Update for those who’ve left the other app:

I’m now on the policy team at Model Evaluation and Threat Research (METR). Excited to be “doing AI policy” full-time.

1 year ago 15 1 3 0

Why aren’t our AI evaluations better? AFAICT a key reason is that the incentives around them are kinda bad.

In a new post, I explain how the standardized testing industry works and write about lessons it may have for the AI evals ecosystem.

open.substack.com/pub/contextw...

1 year ago 5 1 0 0

This is perfect in its own way

1 year ago 3 0 0 0

Natural minds and natural bodies are irreplaceable. Artificial minds are costless to replace. We might value artificial bodies more, since they aren’t so disposable, at least in the brief period when they are still few and costly. Could be a good period to set stories in.

1 year ago 4 0 0 0

TOYOTA AIR JET LOOMS JAT 810 JA4S-190 CM RUNNING AT 1200 RPM YouTube video by TEMAC INDIA

When we optimize automation, we sometimes optimize *hard*. Like this automated loom working away at an inhuman 1200 RPM. Wild. youtu.be/WweMNDqDYhc?...

1 year ago 5 0 1 0

In Vitalik’s post he mentions resolving only the highest-volume markets, which I think would address this concern even more directly, but I’m less confident I understand that version.

1 year ago 1 0 0 0

I dunno! Would be fun to find out

1 year ago 1 0 0 0

I wouldn’t say it was free, really. Like, if the creator would’ve needed to spend $1 in subsidies on a regular market, on each market that has a 90% chance of reversion they would need to offer $10 in subsidies to compensate, or whatever.

1 year ago 1 0 1 0

Since the expected payouts on each market are much lower, you probably need big subsidies to compensate. And since you don’t know ahead of time which markets you will resolve, you have to fund them all.

1 year ago 1 0 1 0

You want traders to give you cheap but calibrated estimates for all the claims. The randomization reduces the expected size of payouts they’d receive for their bets, since each market only has a 10% chance of getting audited & resolved, but it preserves the incentive to bet their true probabilities.

1 year ago 1 1 1 0

Let’s take this to DMs :)

1 year ago 1 0 1 0

So if you create a prediction market because you want information on a question, you can think of the market subsidy as the compensation you’re paying folks for their information.

1 year ago 1 0 2 0

Yeah. It’s kinda subtle. With a subsidy, you’re basically giving away money as an incentive. But you can increase liquidity without giving away money.

1 year ago 1 0 1 0

You’re thinking of liquidity, which is related but not the same. Subsidy here just means committing money to increase the payouts to whoever is right.

1 year ago 1 0 1 0

What do you mean by “solve”?

You wanted information about all 100, so you subsidize markets on all of them, and traders can’t tell ahead of time which ones will be resolved, so if your subsidies were big they are incentivized to trade on any/all of the markets that they have information about.

1 year ago 1 0 2 0

Feel like they’ve made a lot of wild statements but I don’t know if anybody has collected those in one place for easy reference.

1 year ago 3 0 1 0

Is there a website/database out there that tracks what major AI company executives say about the future of AI?

1 year ago 8 0 2 0

Transformers and other parallel sequence models like Mamba are in TC⁰. That implies they can't internally map (state₁, action₁ ... actionₙ) → stateₙ₊₁

But they can map (state₁, action₁, state₂, action₂ ... stateₙ, actionₙ) → stateₙ₊₁

Just reformulate the task!

1 year ago 7 0 0 0

YouTube Share your videos with friends, family, and the world

Atticus Geiger gave a take on when sparse autoencoder (SAEs) are/aren’t what you should use. I basically agree with his recommendations. youtube.com/clip/UgkxKWI...

1 year ago 5 0 0 0

These days, flow-based models are typically defined via (neural) differential equations, requiring numerical integration or simulation-free alternatives during training. This paper revisits autoregressive flows, using Transformer layers to define the sequence of flow transformations directly.

1 year ago 2 0 0 0

It isn’t super clear to me what the monthly pricing will be. Like, on the one hand in a competitive market I think the price of AI services will tend downward toward the marginal cost. But also there are only a few providers and constraints on supply. Not sure how it comes out on balance.

1 year ago 3 0 1 0

It might be like that! If so I would expect an experiment like this to indicate that. :)

1 year ago 0 0 1 0

Re: instruction-tuning and RLHF as “lobotomy”

I’m interested in experiments that look into how much finetuning can “roll back” a post-trained model to its base model perplexity on the original distribution.

Has anyone seen an experiment like this run?

1 year ago 4 0 1 0

Ah. Yeah I don’t think there’s anything special about services that brand themselves as “AI agents”. What matters IMO is it’s opaquely doing expensive work on behalf of the client without human oversight.

For those, I think they might want to advertise their guarantees. Not certain, though.

1 year ago 2 0 0 0

Can you say more? Not sure that I understand.

1 year ago 1 0 1 0

“Provider pays” for failed automation services If your AI works as well as you claim, why not make that a promise?

I’ve been wondering when it would make sense for “AI agent” services to offer money-back guarantees. Wrote a short post about this on a flight.

open.substack.com/pub/contextw...

1 year ago 7 0 1 0

xkcd comic 386, with back and forth that goes: “Are you going to bed?” “I can’t. This is important.” “What?” “Someone is WRONG on the internet.” https://xkcd.com/386/

Neat thing about real-money prediction markets is that you can get paid for doing this.

1 year ago 4 0 0 0

From prediction markets to info finance

h/t @vitalik.ca, though I believe the idea is borrowed from @robinhanson.bsky.social

vitalik.eth.limo/general/2024...

1 year ago 2 0 0 0

Posts by Charles Foster