"April Fools' Day" is cultural appropriation and we in the fool community, who live with our own foolishness all year round, do not appreciate being turned into a sanitised mainstream joke. I have worked hard to overcome popular prejudice and am proud to be the president's special adviser on Iran
Posts by Andrew West
ChatGPT and Claude give you fast, referenced answers to the infinite day-to-day questions that can be answered by searching / synthesising knowledge.
They are, in my experience, usually correct (use the Thinking models and Opus). Obv you should check the references.
This is better than before.
I’m on vscode/phpstorm, running mostly Claude Code with Codex as a verifier. Mostly building internal tooling in our WordPress CRM, so within an auth boundary. 8 months ago Claude was a helpful tool that needed close attention for small tasks. Now it’s breezing through complex refactors.
Sure, it’s possible it won’t get there. But so far, progress continues. The frontier models are starting to make modest contributions to maths and physics. I don’t think it’s obvious that ‘coding and some help in maths’ is the natural end point, and it seems plausible it can extend to other domains
If you are trying to build a cancer researcher, not a classifier of x-rays, there is 15y of research that the more human data you put in, the more coherent the outputs.
There have been plenty of philosophical arguments that LLMs will inherently miss some part of being human, and maybe that’s right. But it is simultaneously plausible that they will get _good enough_ to make medical advances anyway.
Ok, you keep stating things as facts that I do not agree are facts (and prob vice versa), so we have prob got as far as we can here.
All right, all I can say at this point is that this isn’t the (tentative) conclusion I had come to from the same data.
But a year ago this wasn’t possible. It might indeed plateau now, who knows. But that argument has been made for the last 3y and yet things continue to improve. Is it not worth taking seriously the possibility that this is coming?
Are you expecting that coding is a specific domain and that the same targeting RL and other techniques won’t scale to other areas? Maybe you’re correct! I just don’t see why…
ChatGPT Pro is being used for Erdos problems in maths. ChatGPT Pro is a model scaled on copyrighted works, amongst many other optimisations. Isn’t it?
Sure, fair enough. I would only implore you to keep an eye on the progress being made in coding, maths, and physics, and reevaluate from time to time.
But the machines making frontier progress in physics and maths are LLMs. Maybe there’s a reason this wouldn’t extend to medical research, but it’s not obvious to me what that is.
Because the aim is to make a human-like intelligence. The research of the last 15y shows that the more data you feed in, the more it pattern matches on all available human thought. The idea is that this leads to an ‘intelligence’ which can at improve medical pattern matching, and hopefully innovate.
You can declare this to be true, and maybe you’re right. But the moral case being made is by thousands of machine learning engineers who disagree, is my point.
(fwiw I think the case is ‘if you think you have a pathway to a machine that can cure cancer by ingesting all IP then you should do it and take the consequences’, but I know that isn’t satisfactory)
Some behaviour isn’t good, for sure. But it’s coming nonetheless and getting better rapidly. The original sin of copyright is a major issue for many, but the maths is just out there now. The more help from smart liberals the better off we’ll all be.
Smart people like you can have major impacts this early in the process - especially because the left are leaving the whole field open for the right to impose their desires.
There is a pathway here to a tech that can make major medical advances, and it would be nice if that pathway could work well for everyone.
The tech is now good enough for plenty of meaningful use cases, while still being bad at others. That’s sufficient. The world really needs the help of smart people like you to be thinking how to deal with the real possibility of job losses, or preventing people using it lazily for important things.
You are a good guy acting in good faith, and I am a fan of your work. But a lot has changed in AI since November - there’s been a coding explosion. And it couldn’t count fingers for 2 months, 2 years ago.
Every few months, I write an updated, idiosyncratic guide on which AIs to use right now.
My new version has the most changes ever, since AI is no longer just about chatbots. To use AI you need to understand how to think about models, apps, and harnesses. open.substack.com/pub/oneusefu...
Claude Code users: if you have a mouse with a bunch of buttons I highly recommend:
—setting one to toggle+paste SuperWhisper (or speech->text app of your choice)
—mapping another to Enter
Feels absolutely decadent, but so pleasing.
Yes exactly. It’s not a given. But I’m just kinda assuming that if you can train a model on code outputs to be as good as Opus, you can do it on things like ‘analyse this doc for legal problems’. Especially when the bar isn’t perfection, it’s just ‘better than the average humans’.
There’s this odd cross-purposes thing where some people are talking about intelligence and some people are talking about utility. I’m doing the latter. My point would be that predictive text - or whatever wording you want to use - is enough.
Sure. But I would feel happier if we could quantify ‘a lot’.
I suppose to me that doesn’t feel like how markets work. Compute and tokens have only got cheaper over time. If there’s demand, someone will supply.
I suppose I see the 90% as jagged. There are some specialties where that’s true. But there are a lot of developers churning out entirely normal stuff like internal tooling and standard apps (me among them) for whom LLMs are speeding them up 5x.
But the claim isn’t that it can do _everything_. There is a lot of software that is less critical than financial systems. It’s that the tech is already good enough to do a lot, and is only improving.
But if you have a model capable of doing 90% of everyday spreadsheet management and administrative processes, are employers not going to seek it out and use it?