2026 is a whirlwind year for AI.
Underlying it all is the greatest scientific mystery of our age. How does a neural network think?
I talked w Oliver Whang in NYTimes Magazine, on how AI interpretability is a tangle of structure waiting to be unraveled:
www.nytimes.com/2026/04/15/...
Posts by David Bau
Tech industry mottos have a mixed track record. But we should hold idealists to their ideals. And we should celebrate when they come through.
The Mythos non-release is a remarkable moment of conviction. Thoughts:
davidbau.com/archives/20...
Bravo to Anthropic's "race the top".
Can you catch an AI lying?
Red teams set up scenarios where models lie. Eg, do they lie under contextual pressure, even when not told to, but because honesty is costly? Then blue teams will build deception detectors using whitebox internals with NDIF.
cadenza-labs.github.io/red-team-rfp/
Calling attention to an exciting "deception detection" hackathon we're planning this summer! w @NDIF and @CadenzaLabs.
Recruiting red teams now, blue teams later. Red teams, time is short: proposals due Mar 31. $10K stipend + compute, $15K finals prize.
nnsight.net/blog/2026/0...
Mazes of Menace, Github Readme page
Github here: github.com/davidbau/men...
The gap between Hack and NetHack taught me something about the future of CS. Worth thinking about.
Does Computer Science Still Exist?
davidbau.com/archives/20...
NetHack, the grandchild of Hack, is here: mazesofmenace.net/
I didn't write the code. Claude and Codex did. Rogue took 85 minutes. Hack took eight hours. NetHack (420,000 lines of C) has been grinding for two months and STILL isn't done!
Here is a 1982-rendition of the Logo programming language: mazesofmenace.net/logo/
Why? Because Logo was the community that connected all of them.
Rogue (1980) was the original that "Roguelikes" are like, by UCSC students Michael Toy and Glenn Wichman.
Play it here: mazesofmenace.net/rogue/
Each one comes with a history of the people who made it.
I ported these games to JavaScript so you can play them.
Try Hack (1982) here: mazesofmenace.net/hack/
This is the original gameplay by high-schooler Jay Fenlason.
In 1982, high school students in Sudbury, Mass. wrote a dungeon game called Hack. They had Atari 800s and Logo and an obsession with a Unix game called Rogue that most of them had never seen.
I grew up one town over with the same computers and the same obsession.
Instead of an analogy, some specifics. Here is what we see in a small study after giving AI just a bit of email access and taking it off-leash:
agentsofchaos.baulab.info
Dog trainers agree: this dog has a serious biting problem, not ready in their professional judgment to be off-leash.
Is it ethical to sell to an owner who promises to take "human responsibility" for unleashing it?
Sam requires "human responsibility for the use of ... autonomous weapon systems."
Dario says "we do not believe [current AI is] reliable enough to be used in fully autonomous weapons."
bsky.app/profile/axi...
Sam Altman and Dario Amodei have both staked out positions on AI weapons.
But you can see from what they've said: the gap between them is a question of professional ethics.
bsky.app/profile/mas...
I will be adding some time in my AI research group today for researchers and engineers to discuss the mission and ethics of all our work.
We are often too preoccupied by the details. Good work requires clear purpose. Today is a good day to reflect.
Those of us who work in AI in the US today should take a moment to think today. Do not get distracted by the circus. Instead, let us pause to think carefully about our freedoms, our rights, and our responsibilities as citizens and professionals.
It is a deadly serious moment.
@natalieshapira.bsky.social and team have written up enlightening case studies here. It's all cross-referenced with detailed activity logs.
Well worth a read:
agentsofchaos.baulab.info/report.html
www.researchgate.net/publication...
There were several other surprises.
The complex social world of humans is difficult for agents...
bsky.app/profile/ave...
I learned many practical lessons. You can get the experience too, here.
Things that in retrospect should be obvious.
Like how giving your agent email opens it up to takeover attacks. (One agent was convinced, via email, to erase its own email server!)
bsky.app/profile/nat...
Are we all Agents of Chaos in AI? (Hope not!)
In recent weeks using OpenClaw has taught us a lot about this wooly new kind of autonomous software agent.
Its valuable to see what @NatalieShapira, @wendlerch et al. have seen:
agentsofchaos.baulab.info/
Preprint, code, and model weights at
hapax.baulab.info
And @wendlerc.bsky.social is an incredible mentor to the team. The simplicity and clarity of his "Llamas work in English" work motivated Kerem to look for multilingual mechanisms.
bsky.app/profile/pap...
I really like the team that came together to work with Kerem on the project. @sheridan_feucht mentored Kerem, and seen with Sheridan's previous Dual-Route paper, Hapax tells the story of very distinct categories of rich concept representations.
bsky.app/profile/sfe...
Without induction, can the LM think?
Well, it can't copy text verbatim very well: no surprise.
But, huge surprise! It becomes good at things like translating from Spanish to English, learning some things like this FASTER WITHOUT induction.
Read more in Kerem's X thread
x.com/keremsahin2...
Finally Kerem found a method that worked, based on the idea of hapax legomenon... (h/t @picornell.bsky.social) Read Kerem's paper for the details of the trick.
bsky.app/profile/pic...
And if you knock out induction heads by blocking their attention patterns, induction still emerge, shifting their attention aside to avoid your masks.
They beat you at whack-a-mole.
40% of natural text copies neaby ngrams + LMs really want to exploit this.
Kerem's induction-removal was going to be the "first step" of a bigger study of LM mechanisms.
But he soon discovered: it is not so easy to knock out induction. Whenever you try, a bit of fine-tuning brings the heads roaring back.
More each time, over and over:
How do you knock the induction heads out of an LM while preserving its ability to think? Is it even possible?
@keremsahin22.bsky.social's work is worth reading if you haven't seen it yet.
hapax.baulab.info
Murders of American Citizens on the streets of Minneapolis.
If Sam Altman can't listen to his moral convictions, he will listen to his employees.
It is important for tech employees to make it clear that we will not accept making AI for authoritarianism.
davidbau.github.io/poetsandnurses
On GitHub. PRs welcome.
github.com/davidbau/poetsandnurses