Advertisement · 728 × 90

Posts by Cas (Stephen Casper)

Post image

🚨 One week left to submit your AI-gov-related research to the TAIGR workshop.

4 days ago 3 0 0 0

Now that Mythos is released, we can start the clock. I'd bet that within 9 months, a system with comparable cyber capabilities will be widely available (either open-weight or openly served). Hopefully, we just have enough time to improve cyberdefense enough to be ready.

1 week ago 2 0 0 0

Now that Mythos is released, we can start the clock. I'd bet that within 9 months, a system with comparable cyber capabilities will be widely available (either open-weight or openly served). Hopefully, we just have enough time to improve cyberdefense enough to be ready.

1 week ago 0 0 0 0
Preview
open weight model safety ideas Concrete project ideas for tamper-resistance & open-weight model safety Stephen Casper scasper@mit.edu This document outlines concrete project ideas I am interested in to improve open-weight model safety by making them more resistant to harmful tampering. See the paper Open Technical Problems in...

See this public notes doc for some more of my related thoughts:

docs.google.com/document/d/...

1 week ago 2 0 0 0

If mechinterp is useful, and I'm not sure it is, then it should be able to help competitively help fix the currently disappointing state of affairs for robust (tamper-resistant) unlearning.

1 week ago 3 0 1 0

If we can't isolate and remove model capabilities for a given subject, then we seem to have failed at a real-world test of the most basic goal of mechinterp: figuring out where the knowledge for a given task comes from inside a network.

1 week ago 4 0 1 0

I think that if we can't get robust (tamper-resistant) unlearning to work in LLMs, this implies we aren't good at mechinterp.

1 week ago 4 0 1 0

🧵🧵🧵
A provocation to the mechanistic interpretability researchers of the world...

1 week ago 10 1 1 0

Me in December: "Wow, freedom of information laws are awesome. I can't wait to get info."

Me now: Enters my 4th month of getting bullied and gaslit by 3 governments at once.

Anyway, if you need to do US FOIA, UK FOI, or EU FOIA requests, let me know -- I have advice.

1 week ago 4 0 0 0
Advertisement
Preview
TAIGR @ ICML 2026 — Workshop on Technical AI Governance Research Second Workshop on Technical AI Governance Research at ICML 2026. Bridging ML researchers and policymakers in Seoul, South Korea.

taigr-workshop.com/

2 weeks ago 0 0 0 0
A city skyline at night showcases bright lights, with text promoting the TAIGR Workshop at ICML 2026 and submission deadline details.

A city skyline at night showcases bright lights, with text promoting the TAIGR Workshop at ICML 2026 and submission deadline details.

Reasons to submit to the ICML Technical AI Gov. Research (TAIGR) workshop:
- 8-page limit
- Broad scope, AI gov-related
- Workshops don't trigger dual submission policies
- Best paper awards both overall and by category
- Great community
- Cool stickers

Deadline April 24!

2 weeks ago 2 0 1 0
Preview
TAIGR @ ICML 2026 — Workshop on Technical AI Governance Research Second Workshop on Technical AI Governance Research at ICML 2026. Bridging ML researchers and policymakers in Seoul, South Korea.

OpenReview for the #TAIGR workshop for #ICML on technical AI governance research is live as of today. (Not an April Fools joke).

See the call and link to OpenReview here: taigr-workshop.com

2 weeks ago 1 0 0 0

I wish more CS papers had tables of contents. Makes them much more navigable. I think one reason it's rare is that submission venues have page limits. So there's often just not room. I wish venues would conditionally relax length requirements by the length of an optional ToC.

2 weeks ago 6 0 0 0

They aren’t signatories to the codes. But the act would still apply to them if they did business in the EU. Notably though simply releasing open models that make their way into the EU isn’t enough for the act to apply to you.

3 weeks ago 1 0 0 0

*EU, not email. Bad typo from dictation. Sorry.

3 weeks ago 1 0 0 0

The EU AI act applies to companies that do business in the email. Not just ones based there. Notably, OpenAI is a signatory of the codes of practice. So if they release another gpt oss model, they will be accountable to get external tampering evals which they didn’t last time.

3 weeks ago 4 1 2 0

Might also be worth clarifying too that this only applies to models that are considered to pose systemic risk under the company’s safety framework.

3 weeks ago 1 0 0 0
Advertisement
Preview
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards Filtering pretraining data prevents dangerous capabilities, doesn’t sacrifice general performance, and results in models that are resistant to tampering.

It’s hard but possible to make models resist harmful fine tuning more. We’re working on it.

deepignorance.ai

arxiv.org/abs/2508.03153

3 weeks ago 3 0 1 0

Yes. Pre release. I should have mentioned. Thanks.

3 weeks ago 1 0 1 0
Post image Post image Post image

Today, I realized that, taken together, Appendices 3.2, 3.3, and 3.5 of the EU AI Act Codes of Practice *unambiguously* require open-weight model developers to let external evaluators conduct adversarial fine-tuning evals on their frontier models. Good.

3 weeks ago 7 2 3 0
Post image

Could social media make us less polarized instead of more?

We tested 5 algorithms on 3 platforms with 10,000 people for 6 months during the 2024 election, and found that the answer is yes.
🧵

3 weeks ago 92 30 2 9
Post image Post image Post image Post image

Just like we have mass school shootings, we now also have mass school AI non-consensual nudifications.

3 weeks ago 3 0 0 0
Post image

Announcing the technical AI Governance Research (TAIGR) ICML workshop in July! Submissions (up to 8 pages) are due April 24. Co-submission with ICML and NeurIPS is encouraged.

taigr-workshop.com

4 weeks ago 2 2 0 0
Stephen Casper - ML Researchers as Policymakers [Alignment Workshop]
Stephen Casper - ML Researchers as Policymakers [Alignment Workshop] Stephen Casper (MIT CSAIL) demonstrates how ML researchers can directly influence policy by strategically conducting technical research that operationalizes ...

Do you do technical AI research? In this talk, I argue that you 🫵 should see yourself quite literally as a type of policymaker. Thanks @far.ai.

www.youtube.com/watch?v=Ekp...

1 month ago 11 0 1 1
Preview
EU set to ban AI nudification apps in wake of Grok scandal The ban, laid out in proposals seen by POLITICO, could kick in this summer.

The official text isn't out yet & will matter a lot. But Europe may be moving to treat AI nudification similarly to how pirated media or CSAM is treated: as something that, even though it will always be available, can possibly be made much less accessible.
www.politico.eu/article/eu-g...

1 month ago 10 1 0 1
Post image

Using a well-timed screenshot and my phone's cache, I was able to recover some of the since-deleted tweet from Jeremy Lewin, where he admitted that the government sees the new OpenAI contract language as just "memorializing" a vague "commitment" rather than drawing any real new lines.

1 month ago 7 1 0 0
Preview
Circular Altruism — LessWrong Followup to:  Torture vs. Dust Specks, Zut Allais, Rationality Quotes 4 …

This argument is outlined here www.lesswrong.com/posts/4ZzefK...

1 month ago 1 0 0 0
Advertisement
Post image

As someone who is not a fan of @anthropic.com...I think you should use Claude.

1 month ago 7 0 0 0
Stephen Casper (MIT) on stage speaking

Stephen Casper (MIT) on stage speaking

We almost certainly won't make AI safe by making safe AI.
Others are still going to create unsafe AI.

– @scasper.bsky.social at #IASEAI2026 Open-Weight AI Risk Management Workshop

I led one of the discussion groups and we had some nice new ideas of how to make open weight models safe 😊

1 month ago 3 1 2 0
Post image Post image

Given:
1. Last summer, frontier closed-weight model devs started to share warnings about nasty model capabilities.
&
2. Open-weight models are a few months behind closed ones.

We should not be surprised if there is a big cyber/terror incident enabled by a powerful open-weight AI model in 2026.

1 month ago 0 0 0 0