Advertisement · 728 × 90

Posts by R. Keelan

Cover of On Spec number 134

Cover of On Spec number 134

On Spec @onspecmag.bsky.social ceased regular publication in late 2025. It's a magazine that's launched careers, served as an incubator for talent, and has done enormous good for SFF. Publisehed Doctorow, Czerneda, Peter Watts

Deserves semiprozine nod and short-form editor nod for Diane Walton

9/X

2 months ago 23 10 1 3

Historians of the future will have a diverting sideline explaining that the term "trumped up charges" actually predates the Trump presidencies

3 months ago 1 0 0 0

But there's just so obviously something there! I want people to take it seriously!

7 months ago 0 0 0 0

To be clear, I do not love that AI is within spitting distance of being able to do so much white collar work. I am a programmer and I work remotely--my job is very much at risk. In many ways, it would be a relief for me if it turned out that AI was a bubble and there was actually nothing there.

7 months ago 0 0 1 0

Or we'll spend a couple million writing custom tools and other infrastructure for the AI to use, rather than having it use stuff made for humans

7 months ago 0 0 1 0

Or we'll pay a team of really smart people to spend a year breaking down as many tasks as possible into 5-minute units of work where the AI has better odds of success

7 months ago 0 0 1 0

"Ha! The AIs fails half the time when they try to do even an hour's work at once!"

Okay, but it costs $1 and takes 5 minutes. Let's run 100 in parallel and pick the best.

7 months ago 0 0 1 0

But even if the performance improvements ended right now, non-programmers (and even many programmers) are just vastly underestimating how much more there is that can be done (this is often called the "product overhang.")

7 months ago 0 0 1 0

It's true that there's no guarantee that the trend will continue, but it doesn't have to continue much longer for AI to be able to be able to accomplish a very large amount of economically useful work on it's own.

7 months ago 0 0 1 0

GPT-5 is on the same trend as GPT-2, -3, -3.5, and -4. A bit above trend, actually, and the trend is that task duration doubles every 7 months.

7 months ago 0 0 1 0
Advertisement

METR's research measures the ability of models to complete tasks of various length with some odds of success. They measure GPT-5 as being being able to complete 1-hour tasks with 50% chance of success, or complete 6-minute tasks with 80% chance of success.

7 months ago 0 0 1 0

"How to judge model performance" has been a bit of a moving target, so I wouldn't blame you for thinking proponents of AI were engaged in goal-post moving, but I think METR's research (metr.org/blog/2025-03...) on AI's ability to complete long tasks is the current-best way to judge it.

7 months ago 0 0 1 0

The correct sequence of comparisons is 2 vs 3, 3 vs 4, and 4 vs 5, in which case GPT-5 is exactly as impressive as it should be.

In other words, GPT-4o and o3 "ate" a bunch of the GPT4-to-5 improvement jump, which makes GPT-5 seem less impressive than it actually is

7 months ago 0 0 1 0

But look at the timeline:

- GPT-3 was released June 2020
- Chat GPT was released 30 November, 2022
- GPT-4 was released March 14th, 2023
- GPT-4o was released May 13th, 2024
- o3 was released April 16th, 2025
- GPT-5 was released August 7th, 2025

7 months ago 0 0 1 0

The disappointment with GPT-5 is mostly a mistake: people were expecting a "full GPT worth of improvement" (i.e., similar to the difference between GPT-2 and -3, or between -3 and -4) between o3 and GPT-5, and didn't get that

7 months ago 0 0 1 0

I fear we are hurtling towards a broader national crisis moment here, which is part of the reason I keep returning to strategy and optics, because the consequences of the crisis moment will not hinge on how angry or righteous you are, but how many people are angry with you.

10 months ago 296 47 8 1

I use both—Chrome for gmail, Google Maps, and general searching, then Edge for a variety of websites I regularly open for a specific purpose (e.g., banking, other bill payments, etc)

10 months ago 1 0 0 0
Advertisement

Most of the best parts of Star Wars over the past 40 years come from the books, games, and TV shows. People who aren't fans aren't aware that stuff exists, so they have no idea why the fans have such affection for the franchise

11 months ago 3 0 0 0

I thought that was where the clip was going!

1 year ago 1 0 0 0

At this rate MAGA will only be able to afford to rent the libs.

1 year ago 24744 5426 300 228

Modern LLMs have hundreds of billions of parameters (maybe even trillions by now). That's alot of space to represent a lot of concepts. No one should be confident that they know when LLM performance and abilities will plateau. 8/8

1 year ago 0 0 0 0

For LLMs to write as coherently as they do on such a broad range of topics requires more than just knowledge of language, because language isn't precise enough.

"I saw a man in a park with a telescope."

Is the telescope in the park, or with the speaker? It's ambiguous. 7/n

1 year ago 0 0 1 0

Here's another intuition pump: how well would you have to know someone in order to predict what they'd say in certain situations? This is possible—maybe you can do this for your spouse or children or siblings—but you need to know them *really* well. 6/n

1 year ago 0 0 1 0

If you throw a ball in the air you can calculate how long it will take to hit the ground using simple math. But not just *any* math. You need the equations of motion. These aren't just random calculations. They are a model of the world encoding facts about reality. 5/n

1 year ago 0 0 1 0

When I saw ChatGPT, it was obvious there was more going on. This is the kind of thing that was going on. Training to predict the next word resulted in the LLMs building increasingly detailed, comprehensive, and accurate models of the world. 4/n
transformer-circuits.pub/2025/attribu...

1 year ago 0 0 1 0

You put all the words in a bag and count how often they show up. Maybe you count pairs of words (bigrams) or triplets (trigrams) or some other sequence length (n-grams). I have seen the results from those kinds of systems and they were not good. 3/n

1 year ago 0 0 1 0
Preview
Machine Learning

Here is XKCD describing machine learning as a pile of algebra (no fault here, it's a 40-word webcomic). The serious version of this description (it's all statistics) conjures in peoples' minds the Bag of Words approach to machine learning. 2/n
xkcd.com/1838

1 year ago 0 0 1 0
Advertisement

Popular accounts describing LLMs as word predictors powered by statistics (or, if it were unusually rigorous, "incredibly complex math") misled a lot of people into unwarranted AI skepticism. 1/n

1 year ago 0 0 1 0

*Chef's Kiss*

1 year ago 0 0 0 0