Steve Byrnes (@stevebyrnes) Bsky

Some takes on UV & cancer — LessWrong ToC: Part 1: In which I use my optical physics background to share some hopefully-uncontroversial observations. Part 2: In which I boldly defy Public Health Orthodoxy on the whole UV situation

Blog post: “Some takes on UV & cancer”

• Part 1: In which I use my optical physics background to share some hopefully-uncontroversial observations

• Part 2: In which I boldly defy Public Health Orthodoxy on the whole UV situation

www.lesswrong.com/posts/t7GeZn...

1 week ago 1 0 0 0

Less Dead — LessWrong > Come with me if you want to live. – The Terminator • > 'Close enough' only counts in horseshoes and hand grenades. – Traditional • …

There’s a bunch of discussion at www.lesswrong.com/posts/E9xfgJ... & www.lesswrong.com/posts/NEFNs4... , including the comments sections where company staff has been answering questions

3 weeks ago 1 0 0 0

New blog post: “‘Act-based approval-directed agents’, for IDA skeptics” www.alignmentforum.org/posts/RKtTi8...

1 month ago 2 0 0 0

screenshot of the title and first couple paragraphs of the article at the link

New blog post: “You can’t imitation-learn how to continual-learn” www.lesswrong.com/posts/9rCTjb...

1 month ago 3 0 1 0

Podcast: Jeremy Howard is bearish on LLMs — LessWrong Jeremy Howard was recently[1] interviewed on the Machine Learning Street Talk podcast: YouTube link, interactive transcript, PDF transcript. …

I linkposted a podcast with some excerpts. Title: “Jeremy Howard is bearish on LLMs” [I mean “bearish” compared to most people in my professional circles] www.lesswrong.com/posts/hvun2m...

1 month ago 1 0 0 0

I.e., this kind of “self-play” will make them dumber and dumber, gradually at first but inexorably.

(Not sure there’s no point in arguing about this—presumably we can just wait and see. ¯\_(ツ)_/¯ )

1 month ago 0 0 1 0

…by proposing new ideas and “training themselves” when those ideas “seem right” to them, then I don’t think they’ll invent Ricci flow, rather I think they’ll have bad ideas that “seem right” to them, then lock them into the training data, and mistakes will compound in a spiral into nonsense.

1 month ago 1 0 1 0

If math is a human enterprise that LLMs are helping with, it’s basically OK, because the LLM’s “reflexes” are all honed on good (human-provided) data. Whereas if we pretrain LLMs on exclusively pre-1970 math, put them in a sealed box for 100 years, and ask them to discover new-to-them math concepts…

1 month ago 0 0 1 0

I disagree. I think LLMs lack a general ability to notice that something doesn’t make sense, a sense that humans have, but this deficiency is not too obvious because human-provided training data can substitute in-distribution. Cf. Litt’s last blog post ↓

1 month ago 0 0 1 0

We’re talking about this kind of self-contained system that searches for proofs, and adds Lean-verified ones to the training data perpetually, right? …I would call that system “RL”.

Terminology aside, I claim it shares with RL the property of asymptoting to ruthlessness as self-play proceeds.

1 month ago 0 0 1 0

So the latter (no-ground-truth) version might not be ruthless, but I don’t think you can ASI that way.

(Sorry if I’m misunderstanding.)

1 month ago 1 0 0 0

Another version would lack any ground truth—it would just be LLMs judging each other. What I actually expect here is that the system would be incompetent. If the LLM judges make mistakes, the “self-play” would make the system ever more confident about those mistakes. It would spiral into nonsense.

1 month ago 1 0 3 0

OK so one version of this would have ground truth (e.g. proof assistant) gatekeeping the training data. In this case, as you self-play more and more, I claim you’ll dilute away any human kindness from pretraining, and gradually turn the LLM into a ruthless pursuer of “satisfy the proof assistant”.

1 month ago 1 0 2 0

There’s one human brain design, barely changed since Pleistocene Africa. Many copies of it, over centuries, built language, science, technology, & the whole global economy from scratch.

If an AI design can’t do that, I’d vote against even calling it “human-level” let alone “ASI”.

So I guess “yes”

1 month ago 0 0 0 0

“Sharp Left Turn” discourse: An opinionated review — LessWrong The goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than a...

This requires a kind of open-ended continual autonomous learning and figuring-things-out and putting those things in the weights (not context window). Nobody has yet invented that for LLMs (though they’re sure trying). See also §1 of www.lesswrong.com/posts/2yLyT6...

2 months ago 0 0 1 0

This part ↓ (including the link www.lesswrong.com/posts/xJWBof...) might help explain what I have in mind by “ASI”. There isn’t a quadrillion-dollar market for humans with linear-time SAT solvers. Whereas ASIs could run an ever-growing global economy themselves, including self-reproducing etc.

2 months ago 0 0 2 0

(3) Even if we got past those two hurdles, I don’t think the results would be great for reasons here ↓

[& sorry if I’m missing your point :) ]

2 months ago 3 0 1 0

(2) If people tried that, I expect they would explore a space of RL reward functions in which EVERY possibility leads to ruthless sociopaths. (I claim evolution does weird unorthodox things with reward functions, beyond the imaginings of RL researchers, see alignmentforum.org/posts/xw8P8H... )

2 months ago 3 0 1 0

(1) That’s very unlikely to happen even if it’s a good idea. (Note that almost nobody does that in RL today. Generally, outer-loop searches in ML are super expensive.)

2 months ago 3 0 1 0

Social instincts are part of the RL reward function, not the trained model. So in theory, an RL programmer could do an outer-loop search over RL reward functions, as evolution did. This is true! But, some problems are:

2 months ago 3 0 1 0

Why we should expect ruthless sociopath ASI — LessWrong (Fictional) Optimist: So you expect future artificial superintelligence (ASI) “by default”, i.e. in the absence of yet-to-be-invented techniques, to be a ruthless sociopath, happy to lie, cheat, and s...

In this post, I make my case, including why we should expect superintelligence to be MUCH MORE ruthless than either humans or LLMs. (2/2) www.lesswrong.com/posts/ZJZZEu...

2 months ago 5 0 0 0

New post: “Why we should expect ruthless sociopath ASI”

A rift between super-pessimists like me, vs the “merely” AI-concerned, is an intuition that future AI will be kinda like a ruthless sociopath. I claim it’s a sound intuition, but it might seem to come from left field… 1/2

2 months ago 12 0 4 2

The brain is a machine that runs an algorithm — LessWrong Some people say “the brain is a computer”. Other people say “well, the brain is not really a computer, because, like, what’s the hardware versus the software?” I agree: “the brain is a computer” is ki...

Blog post: “The brain is a machine that runs an algorithm” www.lesswrong.com/posts/eKGjwR...

2 months ago 14 2 1 0

I dunno, feels pretty major to me. Here’s the changelog.

2 months ago 1 0 0 0

My post from last week, “The nature of LLM algorithmic progress”, is now a heavily-rewritten version 2! Thanks commenters for setting me straight on a number of points :) www.lesswrong.com/posts/sGNFtW...

2 months ago 0 0 1 0

Blog post: “In (highly contingent!) defense of interpretability-in-the-loop ML training”.

Using interpretability as input into a loss function / reward function has a bad rap (and deservedly so). But there’s a specific version of it that might work. www.alignmentforum.org/posts/ArXAyz...

2 months ago 0 0 0 0

Blog post: “The nature of LLM algorithmic progress”

bit of Cunningham’s Law energy with this one: spicy hot takes, far outside my area of expertise. Feedback welcome! www.lesswrong.com/posts/sGNFtW...

2 months ago 3 0 1 0

The part where I agree: we need to get there, somehow or other—right now we don’t have the deep understanding required for high-reliability engineering, so we’d better get it! Link again: www.alignmentforum.org/posts/hiigux... (3/3)

2 months ago 0 0 0 0

The part where I disagree: some people say that if we “just” apply known best practices, everything will be fine. I summarize what those best practices are, and argue that applying those best practices to AGI, in our current state of understanding, is impossible. (2/3)

2 months ago 0 0 1 0

New blog post: “Are there lessons from high-reliability engineering for AGI safety?” People sometimes suggest that high-reliability engineering is a model for how AGI safety could or should work. I agree in some ways and disagree in other ways. (1/3) www.alignmentforum.org/posts/hiigux...

2 months ago 2 0 1 0

Posts by Steve Byrnes