Gabriele Sarti (@gsarti.com) Bsky

Tech industry mottos have a mixed track record. But we should hold idealists to their ideals. And we should celebrate when they come through.

The Mythos non-release is a remarkable moment of conviction. Thoughts:
davidbau.com/archives/20...

Bravo to Anthropic's "race the top".

4 days ago 13 3 1 0

Mfw fiddling with probes all day but patching experiments don't pan out

1 week ago 5 1 0 0

Congrats!

1 week ago 1 0 0 0

Thank you for having me! Next time in person! 🤗

2 weeks ago 5 0 0 0

Calling attention to an exciting "deception detection" hackathon we're planning this summer! w @NDIF and @CadenzaLabs.

Recruiting red teams now, blue teams later. Red teams, time is short: proposals due Mar 31. $10K stipend + compute, $15K finals prize.

nnsight.net/blog/2026/0...

2 weeks ago 5 2 1 0

I truly believe the rapid advances in the mech interp subfield have something real to offer AI ethics researchers: A chance to look beyond the HOW of evals to the WHY, a first pass at a technical solution when we see the opportunity, a new avenue for showing failures that prove models are not gods

2 weeks ago 6 3 1 0

Based

2 weeks ago 2 0 0 0

At long last we have created Palantir, from the classic fantasy novel Don't Create The Palantir

2 weeks ago 1 0 0 0

🧵New paper: "Lost in Backpropagation: The LM Head is a Gradient Bottleneck"
The output layer of LLMs destroys 95-99% of your training signal during backpropagation, and this significantly slows down pretraining 👇

1 month ago 107 15 6 5

Check out David's NetHack port!

"Complexity does not yield to speed. Judgment remains essential. The work of deciding what matters, of seeing what is hidden, of knowing when your own metrics are lying to you: this is the work that remains, and it is the work worth learning."

3 weeks ago 2 0 0 0

Era brillosto e i tospi agiluti
Facean girelli nella civa
Tutti i paprussi eran melacri
Ed il trugon striniva

3 weeks ago 2 0 1 1

This said, I do agree with the criticism re: model choice. LLaMA 3 70b capabilities are definitely too limited to display the kind of interesting (mis)aligned behaviors. The research question does stand regardless of model choice, though!

3 weeks ago 0 0 0 0

This is an important project! If you believe alignment faking is true, you should at least entertain the possibility of misalignment faking before drawing your conclusions.

Especially true if researchers fishing for misaligned behaviors are the ones running the evals!

3 weeks ago 21 1 1 1

BlackboxNLP is back once again at EMNLP'26! Very happy to be part of the team again, and excited for our new reproducibility track! Check it out ⬇️

3 weeks ago 14 3 0 0

tired: meta omni-translation to 1600 low-resource languages
wired: kagi translate english to mechinterp

3 weeks ago 17 1 1 1

I'm calling it DeepSeek's new 1T parameter model (V4)?

The style, content, and length of the reasoning are extremely similar.

4 weeks ago 5 3 1 1

My contribution to model welfare efforts for today

4 weeks ago 25 0 2 0

What's your ✨ convergent epistemic state ✨?

4 weeks ago 5 0 0 0

This morning I happened to hang out around the Harvard med school café and all conversations I overheard were about LLMs med assistants and XAI 🫡

1 month ago 7 0 1 0

I was puzzled by people doing this when they have a research background in different disciplines and in some case are employed full time by some big tech. Is reviewing a requirement to transition to research roles in Amazon/Google/Meta? 🤔

1 month ago 2 0 0 0

I want to talk about why AI-based mass surveillance is so dangerous, and why I would oppose it no matter which party or president is in office.

1 month ago 49 10 2 0

🔥Super excited to share our new demo website for 🪄Interpreto!

🖼️It is basically an explanation gallery showcasing attribution and concept-based explanations for classification and generation.

🎮Play with it: for-sight-ai.github.io/interpreto-d...

We will keep improving it, so stay tuned!

1 month ago 9 3 1 0

Great wrap-up for #EVALITA2026! 🔥

Glad to have helped organize this edition and to see many interesting discussions!

Great response to our task Cruciverb-IT (with Ciaccio C., @gsarti.com, Dell’Orletta F., @malvinanissim.bsky.social)!

Thanks to all co-organizers and @ailc-nlp.bsky.social! #NLProc

1 month ago 7 1 0 0

Great release from our engineering team! A lot of the major pain points have been addressed, and this is our first step towards supporting interpretability workflows on more realistic scenarios! Check it out!

1 month ago 5 1 0 0

Those of us who work in AI in the US today should take a moment to think today. Do not get distracted by the circus. Instead, let us pause to think carefully about our freedoms, our rights, and our responsibilities as citizens and professionals.

It is a deadly serious moment.

1 month ago 61 8 1 2

I'm excited to share that this paper was accepted at ICLR 2026! We show that language models encode one of the most basic ingredients of a world model: the ability to distinguish plausible from implausible states. Check out the paper for more details!

See you in Rio!
Paper: arxiv.org/abs/2507.12553

1 month ago 31 7 3 0

Hopefully it will make it smaller tho!

1 month ago 5 0 0 0

In this amazing multidisciplinary collaboration, we report our early experience with the @openclaw-x.bsky.social ->

1 month ago 40 22 1 10

Are we all Agents of Chaos in AI? (Hope not!)

In recent weeks using OpenClaw has taught us a lot about this wooly new kind of autonomous software agent.

Its valuable to see what @NatalieShapira, @wendlerch et al. have seen:

agentsofchaos.baulab.info/

1 month ago 16 7 2 2

Our research report on red-teaming stateful OpenClaw agents in the BauLab is finally out! 🥳

This awesome effort was led by @natalieshapira.bsky.social and involved 6 ClawBots and 20 researchers from various institutions.

Check it out ➡️ agentsofchaos.baulab.info

1 month ago 14 4 0 0

Posts by Gabriele Sarti