Advertisement · 728 × 90

Posts by David Bau

2026 is a whirlwind year for AI.

Underlying it all is the greatest scientific mystery of our age. How does a neural network think?

I talked w Oliver Whang in NYTimes Magazine, on how AI interpretability is a tangle of structure waiting to be unraveled:

www.nytimes.com/2026/04/15/...

23 hours ago 11 2 2 0

Tech industry mottos have a mixed track record. But we should hold idealists to their ideals. And we should celebrate when they come through.

The Mythos non-release is a remarkable moment of conviction. Thoughts:
davidbau.com/archives/20...

Bravo to Anthropic's "race the top".

1 week ago 13 3 1 0

Can you catch an AI lying?

Red teams set up scenarios where models lie. Eg, do they lie under contextual pressure, even when not told to, but because honesty is costly? Then blue teams will build deception detectors using whitebox internals with NDIF.

cadenza-labs.github.io/red-team-rfp/

3 weeks ago 3 1 0 0

Calling attention to an exciting "deception detection" hackathon we're planning this summer! w @NDIF and @CadenzaLabs.

Recruiting red teams now, blue teams later. Red teams, time is short: proposals due Mar 31. $10K stipend + compute, $15K finals prize.

nnsight.net/blog/2026/0...

3 weeks ago 5 2 2 0
Mazes of Menace, Github Readme page

Mazes of Menace, Github Readme page

Github here: github.com/davidbau/men...

4 weeks ago 1 0 0 0
Post image

The gap between Hack and NetHack taught me something about the future of CS. Worth thinking about.

Does Computer Science Still Exist?

davidbau.com/archives/20...

4 weeks ago 4 1 1 1
Post image

NetHack, the grandchild of Hack, is here: mazesofmenace.net/

I didn't write the code. Claude and Codex did. Rogue took 85 minutes. Hack took eight hours. NetHack (420,000 lines of C) has been grinding for two months and STILL isn't done!

4 weeks ago 1 0 1 0
Post image

Here is a 1982-rendition of the Logo programming language: mazesofmenace.net/logo/

Why? Because Logo was the community that connected all of them.

4 weeks ago 2 0 1 0
Post image

Rogue (1980) was the original that "Roguelikes" are like, by UCSC students Michael Toy and Glenn Wichman.

Play it here: mazesofmenace.net/rogue/

Each one comes with a history of the people who made it.

4 weeks ago 1 0 1 0
Advertisement
Post image

I ported these games to JavaScript so you can play them.

Try Hack (1982) here: mazesofmenace.net/hack/
This is the original gameplay by high-schooler Jay Fenlason.

4 weeks ago 2 0 1 0
Post image

In 1982, high school students in Sudbury, Mass. wrote a dungeon game called Hack. They had Atari 800s and Logo and an obsession with a Unix game called Rogue that most of them had never seen.

I grew up one town over with the same computers and the same obsession.

4 weeks ago 7 1 1 0
Agents of Chaos A two-week study of autonomous LLM agents deployed in a live multi-party environment with persistent memory, email, shell access, and real human interaction.

Instead of an analogy, some specifics. Here is what we see in a small study after giving AI just a bit of email access and taking it off-leash:

agentsofchaos.baulab.info

1 month ago 2 0 0 0

Dog trainers agree: this dog has a serious biting problem, not ready in their professional judgment to be off-leash.

Is it ethical to sell to an owner who promises to take "human responsibility" for unleashing it?

1 month ago 0 0 1 0
Preview
Axios (@axios.com) NEW: Sam Altman says OpenAI shares Anthropic's red lines in Pentagon fight https://www.axios.com/2026/02/27/altman-openai-anthropic-pentagon

Sam requires "human responsibility for the use of ... autonomous weapon systems."

Dario says "we do not believe [current AI is] reliable enough to be used in fully autonomous weapons."

bsky.app/profile/axi...

1 month ago 1 0 1 0
Preview
Mike Masnick (@masnick.com) My goodness.

Sam Altman and Dario Amodei have both staked out positions on AI weapons.

But you can see from what they've said: the gap between them is a question of professional ethics.

bsky.app/profile/mas...

1 month ago 5 0 1 0

I will be adding some time in my AI research group today for researchers and engineers to discuss the mission and ethics of all our work.

We are often too preoccupied by the details. Good work requires clear purpose. Today is a good day to reflect.

1 month ago 9 0 0 0

Those of us who work in AI in the US today should take a moment to think today. Do not get distracted by the circus. Instead, let us pause to think carefully about our freedoms, our rights, and our responsibilities as citizens and professionals.

It is a deadly serious moment.

1 month ago 61 8 1 2
Post image

@natalieshapira.bsky.social and team have written up enlightening case studies here. It's all cross-referenced with detailed activity logs.

Well worth a read:

agentsofchaos.baulab.info/report.html
www.researchgate.net/publication...

1 month ago 3 1 0 0
Preview
@averyyen.bsky.social Do you know what happens when you hand the keys to your computer over to an LLM-powered agent? Agentic AI gives LLMs claws...OpenClaws. 84 days to 200,000 stars on GitHub. We tried it out.

There were several other surprises.

The complex social world of humans is difficult for agents...

bsky.app/profile/ave...

1 month ago 2 2 1 0
Advertisement
Preview
Natalie Shapira (@natalieshapira.bsky.social) He sold us out. That's not the whole story. Our side is coming soon. Stay tuned. [contains quote post or other embedded content]

I learned many practical lessons. You can get the experience too, here.

Things that in retrospect should be obvious.

Like how giving your agent email opens it up to takeover attacks. (One agent was convinced, via email, to erase its own email server!)

bsky.app/profile/nat...

1 month ago 1 1 1 0
Post image

Are we all Agents of Chaos in AI? (Hope not!)

In recent weeks using OpenClaw has taught us a lot about this wooly new kind of autonomous software agent.

Its valuable to see what @NatalieShapira, @wendlerch et al. have seen:

agentsofchaos.baulab.info/

1 month ago 16 7 2 2
Post image

Preprint, code, and model weights at

hapax.baulab.info

1 month ago 2 0 1 0
Preview
Paper (@paper.bsky.social) [26/30] 99 Likes, 97 Comments, 1 Posts 2402.10588, cs․CL | cs․CY, 24 Feb 2024 🆕Do Llamas Work in English? On the Latent Language of Multilingual Transformers Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West

And @wendlerc.bsky.social is an incredible mentor to the team. The simplicity and clarity of his "Llamas work in English" work motivated Kerem to look for multilingual mechanisms.

bsky.app/profile/pap...

1 month ago 4 0 1 0
Preview
Sheridan Feucht (@sfeucht.bsky.social) [📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

I really like the team that came together to work with Kerem on the project. @sheridan_feucht mentored Kerem, and seen with Sheridan's previous Dual-Route paper, Hapax tells the story of very distinct categories of rich concept representations.

bsky.app/profile/sfe...

1 month ago 3 0 1 0
Post image

Without induction, can the LM think?

Well, it can't copy text verbatim very well: no surprise.

But, huge surprise! It becomes good at things like translating from Spanish to English, learning some things like this FASTER WITHOUT induction.

Read more in Kerem's X thread
x.com/keremsahin2...

1 month ago 4 0 1 0
Isabel Picornell (@picornell.bsky.social) This author has chosen to make their posts visible only to people who are signed in.

Finally Kerem found a method that worked, based on the idea of hapax legomenon... (h/t @picornell.bsky.social) Read Kerem's paper for the details of the trick.

bsky.app/profile/pic...

1 month ago 3 0 1 0
Post image

And if you knock out induction heads by blocking their attention patterns, induction still emerge, shifting their attention aside to avoid your masks.

They beat you at whack-a-mole.

40% of natural text copies neaby ngrams + LMs really want to exploit this.

1 month ago 4 0 1 0
Advertisement
Post image

Kerem's induction-removal was going to be the "first step" of a bigger study of LM mechanisms.

But he soon discovered: it is not so easy to knock out induction. Whenever you try, a bit of fine-tuning brings the heads roaring back.

More each time, over and over:

1 month ago 3 0 1 0
Post image

How do you knock the induction heads out of an LM while preserving its ability to think? Is it even possible?

@keremsahin22.bsky.social's work is worth reading if you haven't seen it yet.

hapax.baulab.info

1 month ago 27 6 1 1
Murders of American Citizens on the streets of Minneapolis.

Murders of American Citizens on the streets of Minneapolis.

If Sam Altman can't listen to his moral convictions, he will listen to his employees.

It is important for tech employees to make it clear that we will not accept making AI for authoritarianism.

davidbau.github.io/poetsandnurses

On GitHub. PRs welcome.
github.com/davidbau/poetsandnurses

2 months ago 1 0 0 0