Lawrence Jones (@lawrencejones.dev) Bsky

Don’t get this? Anthropic margin on inference is about 40% from various sources.

Mostly aimed at @greybunny.bsky.social (original thread is locked) what makes you think this isn’t full price, whatever that means?

2 hours ago 0 0 0 0

We shall see though! Until just recently the best study for this was on very old models and suggested devs got slower while feeling faster.

METR have updated the study to say this is no longer the case and most new studies show 20% improvements.

Expect that will go up as people fully adopt tools

23 hours ago 0 0 1 0

I agree it’s difficult, that said we’ve seen substantial quantitative improvements in metrics we care about: time to fix, outstanding bugs, etc.

And there are tasks that are either not feasibly humanly or achievable 3-10x as fast with AI where the additional spend feels well worth it.

23 hours ago 1 0 1 0

Yeah all depends on your circumstance as to what is ok/isnt.

Our production AI bill is almost 10x Claude spend so puts things in perspective.

But to the original point; if you are willing to spend this money on the product, it is likely a very useful product.

1 day ago 0 1 1 0

No worries, fwiw I’m very much on the side of OpenAI spend being way unsustainable, so we’re not cross purposes here.

On Claude; we’ve been active users for about a year now. We track the cost carefully (it’s ~$3k/dev/month) but we get huge benefits from it, so value exchange feels ok for now.

1 day ago 0 0 1 1

I’d add that this is much more common than you’d think if you’re on social media, as while most companies do it, if you’re talking about token costs you’re either a hobbyist or a person responsible for corp spend (very small group)

75% of Claude revenue comes from this usage.

1 day ago 0 0 1 0

Correct, same with all our team ~50 devs.

1 day ago 0 0 2 0

Fair, though you’re quoting a pro sub which I think is a different discussion.

If we’re talking about professional dev consumer buying Claude/codex: I am a happy business user of Claude where we pay the full price for the service vs the subscription.

And there are literally millions of us!

1 day ago 0 0 2 0

I didn’t realise they’d already deployed ads, thanks for the correction.

But either way: if people are using it in such large quantities despite it being inaccurate and now having ads, that strongly implies it’s a great consumer product.

Imagine when the default model is more like today’s SOTA

1 day ago 1 0 0 0

Is a product offered for free with no attempt at monetisation like ads even a consumer product at that point?

1 day ago 0 0 1 1

If you want to know how much people love something watch their reaction when you take it away.

We host OpenAI’s status page and let me tell you from our server stats, people could not be more dependent of these tools! This is not a reaction of a consumer unhappy with the product.

2 days ago 5 0 1 0

The original study was essentially GitHub Copilot line complete on GPT-4o

Couldn’t be further than the current best in class tools

2 days ago 2 0 0 0

Who would you bill this to? If you used a model to calculate this, who foots the bill?

Users wouldn’t want to, as it’s for Anthropic’s benefit.

4 days ago 4 0 0 0

That’s not what anyone I know is actually doing. Unsure who you know that made you think this is normal!

4 days ago 0 0 1 0

I agree? I can’t see the parent thread (blusky is failing me) but I agree with exactly what you’re saying.

5 days ago 0 0 0 0

If it’s useful I believe LLMs are like fast food to juniors. Genuinely terrible, I don’t know how early careers will overcome it.

But for the normal practitioner these are so useful it’s quite crazy.

5 days ago 1 0 0 0

That study uses a mix of people recruited via social media. That skews to less experienced and is exactly the point I’ve been making which is that in the hands of a professional engineer, these tools are a very different thing than a novice.

It doesn’t compare to industry adoption!

5 days ago 0 0 1 0

Also please do link the studies. As far as I can tell everything links back to METR or has serious flaws, I really really want to know if there are studies that don’t!

5 days ago 0 0 1 0

Trust me, I am checking very carefully what it produces! And whenever it does something wrong I can capture how it went wrong in docs so it doesn’t again.

The outcome is code that matches my preferences and docs that mean others know why I preferred forever after.

I save time on the iteration.

5 days ago 1 0 0 0

That’s really not the case. I can supervise the LLM and have it write code that I properly review far faster than I can write and iterate on a concept by hand.

I can have the LLM scaffold an MVP ten times over and then carefully review and tailor what it produces.

5 days ago 1 0 1 0

When I’ve looked at other claims and followed the references they arrive at that study.

You don’t need to, but if you have the references handy do share?

But again: we are literally fixing customer bug reports autonomosly now with an agent doing all the work. There is no question about this stuff.

5 days ago 0 0 0 0

If you have studies using the latest models and tools that suggest this I would genuinely like to see them! I’d be very interested in what caused them to find this inefficiency/bias.

I’m building things in a day that would have taken weeks before, and would like to know what makes my exp different.

5 days ago 0 0 0 0

That ‘wealth of information’ if you follow the trail points back at the original METR article. That’s the origin of that ‘fact’ & is why I quoted the article above.

There is no wealth of info, especially given there was a step change in AI capabilities just 4 months ago. Too early to have studied.

5 days ago 0 0 2 0

Way better to listen to engineers who have always spoken credibly about their craft who are now talking about how AI is totally changing how they work.

They have way less incentive to mislead and are at the top of their game so will lead with new best practices.

5 days ago 0 0 1 0

I think the study is really bad! For one, open source devs are a terrible sample group where they often don’t have funds to leverage AI like other companies do, but test was also really flawed.

I don’t think the recent or the previous study should be taken seriously. The og study just caught on.

5 days ago 0 0 1 0

Normal product engineers. AI is just an element of building products you still need to do all the rest of it.

It will be expected that engineers learn to work with GenAI but the entire AI team that I work with all have normal engineering backgrounds.

5 days ago 3 0 1 0

I had the most ridiculous interaction yesterday on experienceddevs on a post literally about how to build with AI.

It really wound me up and I bit back more than I would like to, but it was repeated “you are scamming people you are running a scam” when I was just trying to help.

💆

5 days ago 0 0 1 0

Yeah that’s fair I agree, don’t think the tokeniser would apply here other than producing very infrequently used tokens.

5 days ago 0 0 0 0

Definitely not: I work in London and was talking about jobs here.

My company is hiring (incident) as are several others. To name a few: Elevenlabs, Granola, Plain, Jack&Jill, Semaloop.

These are five off the top of my head I could go to ~100 who are actively recruiting in London.

5 days ago 1 0 1 0

It’s a really intoxicating mix that tells people who don’t use AI that they are right, it doesn’t work, people who do are worse than them, and they’re too stupid to know.

5 days ago 5 0 0 0

Posts by Lawrence Jones