So first version of an ml anon starter pack. go.bsky.app/VgWL5L Kept half-anons (like me and Vic). Not all anime pfp, but generally drawn.
Posts by AI Notes
The OpenAI emails are interesting in that they make clear that the goal was to build an AGI and then have 1-5 people control it: www.lesswrong.com/posts/5jjk4C...
That seems...wrong.
I think for most tasks, the bottleneck is reliability, not capability. So even though capability is definitely increasing on some dimensions (for whatever reason, scaling or otherwise, I don't know) most people just don't notice. Very, very few people need the math abilities of o1-preview.
To put it another way: some folks in the NLP community would be horrified if they knew what people actually use search engines for!
It's a funny analogy, but I think the situation might be subtler than this. People use search engines for all sorts of things, not just information retrieval. For some of these other tasks, isn't it conceivable that AI would be more fit for purpose?
People in science and technology are seeing something very different from people in the humanities, but I think that's a temporary phase.
Future AI capabilities are already here—they're just not very evenly distributed.
Isn't this just a matter of different subdisciplines using the word "model" in different ways? I feel like I'm watching a mathematician complaining that fields aren't just a bunch of grass, they have to be commutative.
Real-world usage spans a very broad set of tasks. Look at the data yourself if you don't believe me, e.g.:
www.nber.org/papers/w32966
And true generality is definitely an engineering goal—it's the famous G in "AGI." All frontier model companies are public and explicit about this.
I don't know of any technology adopted as fast as ChatGPT. Examples that are close (personal computers, the internet) indeed became pervasive and foundational. E.g. see www.stlouisfed.org/on-the-econo...
I've met a lot of people who are 100% certain that AI will flop. That's probably who this kind of language is aimed at. I completely agree it would be better if they hedged and said, "There's a decent chance AI will be pervasive, and we want you to help decide how we use it."
LLM-based chatbots are built for general use and in practice are used for a wide variety of things. I'm genuinely curious: what leads you to see them as application-specific artifacts? Or is this more of a normative statement, that you wish they'd be built and used in a more targeted way?
I think it sets a baseline, but not a ceiling. And LLMs have blown way past my baseline expectations for what I guessed next-token prediction would produce. Isn't it at least a reasonable hypothesis they may be learning something deep as a byproduct of a superficial training task?
LLMs are a technique, not a tool: they're not "meant" for anything. (Is the fast Fourier transform "meant" for audio engineering or detecting nuclear tests? Why not both?) And at this point, the best LLM-based systems are far better than the average person at math. Surely that's worth exploring?
Such a good paper! And at the end there's a great summary of counterarguments and counter-counterarguments.
Oh, I see what you're saying! That is interesting, and I don't know of any studies.
The belief was that this made it easier to learn to translate the first word, which then made it easier to learn to translate the second, etc. I don't know if they ran careful experiments to show this was the mechanism.
I think there might be more to the story. One of the biggest AI believers I know is (1) a socially adept extrovert; and (2) was incredibly skeptical, up until LLMs became good enough that they helped him write a certain type of specialized code much faster.
I believe you. There seem to be dramatic differences between subdisciplines. In your work it's useless, but in chemistry, it just won a Nobel. As we figure out what universities should do, I find it helpful to take into account how different our various experiences are.
I think her analysis of the structural pressures on universities is excellent! But what I'm seeing on the ground is a mix of those pressures with "endogenous" aspects of the technology itself: its enormous utility for certain kinds of work, and its rapid improvement. Those are critical factors, too.
Excellent mini-talk! One missing variable is that many profs (in physics, chemistry, CS) are now finding AI extremely useful for their own work. That makes it harder to see as a "cheating device." This seems like a huge factor in the "pivot," and which may not be equally visible in all disciplines.
So is it fair to say your level of belief (or disbelief) would be the same if they'd used the p < 0.05 standard?
I suppose the converse question is interesting too: what grand-but-incorrect discoveries would we have made without an understanding of null hypothesis testing?
Great essay! You ask, "What are the grand discoveries that we wouldn’t have made without an understanding of null hypothesis testing?" Would the discovery of the Higgs boson count? As I understand it, the transition from "cool theory" to "Nobel prize" hinged on a p-value.
Yep! The argument in your paper makes sense. It was just the nonstandard use of "structural stability" that threw me. (In standard usage, e.g., the identity map on a manifold is *not* structurally stable.) Anyway, it's a great article, whatever the terminology you use!
Very likely nothing will change for one inference pass, by continuity. But it's entirely possible that after many more next-token inferences you'll see a large enough to change to affect what output token is produced. (This is much like roundoff error accumulating).
I should say that by "behavior" I mean the result of just one inference pass, as opposed to long-term dynamics.
You're making a simpler and stronger point, I believe: behavior changes *discontinuously* with parameters, a major departure from most neural nets. Traditional "structural stability" is more subtle, and my guess is it would probably be hard to show any real-world transformer is structurally stable.
Thanks for this very useful survey! A question: what exactly is your definition of "structural stability"? Usually the term applies to dynamical systems, but how exactly is a transformer a dynamical system? (It actually looks to me like you might be talking about "continuity" instead?)
Caricature by Edward Linley Sambourne from Punch in 1882 titled “Man is but a Worm,” depicting human evolution, commencing with Chaos, through worm, monkey, culminating in Darwin himself.
OTD in 1881, Charles Darwin published his last book, on earthworms.
It reflected a long interest in animal minds: “One alternative alone is left, namely, that worms, although standing low in the scale of organization, possess some degree of intelligence.”
🧪 🦋🦫 #HistSTM #philsci #pschsky #cogsci