🚨 One week left to submit your AI-gov-related research to the TAIGR workshop.
Posts by Cas (Stephen Casper)
Now that Mythos is released, we can start the clock. I'd bet that within 9 months, a system with comparable cyber capabilities will be widely available (either open-weight or openly served). Hopefully, we just have enough time to improve cyberdefense enough to be ready.
Now that Mythos is released, we can start the clock. I'd bet that within 9 months, a system with comparable cyber capabilities will be widely available (either open-weight or openly served). Hopefully, we just have enough time to improve cyberdefense enough to be ready.
If mechinterp is useful, and I'm not sure it is, then it should be able to help competitively help fix the currently disappointing state of affairs for robust (tamper-resistant) unlearning.
If we can't isolate and remove model capabilities for a given subject, then we seem to have failed at a real-world test of the most basic goal of mechinterp: figuring out where the knowledge for a given task comes from inside a network.
I think that if we can't get robust (tamper-resistant) unlearning to work in LLMs, this implies we aren't good at mechinterp.
🧵🧵🧵
A provocation to the mechanistic interpretability researchers of the world...
Me in December: "Wow, freedom of information laws are awesome. I can't wait to get info."
Me now: Enters my 4th month of getting bullied and gaslit by 3 governments at once.
Anyway, if you need to do US FOIA, UK FOI, or EU FOIA requests, let me know -- I have advice.
A city skyline at night showcases bright lights, with text promoting the TAIGR Workshop at ICML 2026 and submission deadline details.
Reasons to submit to the ICML Technical AI Gov. Research (TAIGR) workshop:
- 8-page limit
- Broad scope, AI gov-related
- Workshops don't trigger dual submission policies
- Best paper awards both overall and by category
- Great community
- Cool stickers
Deadline April 24!
OpenReview for the #TAIGR workshop for #ICML on technical AI governance research is live as of today. (Not an April Fools joke).
See the call and link to OpenReview here: taigr-workshop.com
I wish more CS papers had tables of contents. Makes them much more navigable. I think one reason it's rare is that submission venues have page limits. So there's often just not room. I wish venues would conditionally relax length requirements by the length of an optional ToC.
They aren’t signatories to the codes. But the act would still apply to them if they did business in the EU. Notably though simply releasing open models that make their way into the EU isn’t enough for the act to apply to you.
*EU, not email. Bad typo from dictation. Sorry.
The EU AI act applies to companies that do business in the email. Not just ones based there. Notably, OpenAI is a signatory of the codes of practice. So if they release another gpt oss model, they will be accountable to get external tampering evals which they didn’t last time.
Might also be worth clarifying too that this only applies to models that are considered to pose systemic risk under the company’s safety framework.
It’s hard but possible to make models resist harmful fine tuning more. We’re working on it.
deepignorance.ai
arxiv.org/abs/2508.03153
Yes. Pre release. I should have mentioned. Thanks.
Today, I realized that, taken together, Appendices 3.2, 3.3, and 3.5 of the EU AI Act Codes of Practice *unambiguously* require open-weight model developers to let external evaluators conduct adversarial fine-tuning evals on their frontier models. Good.
Could social media make us less polarized instead of more?
We tested 5 algorithms on 3 platforms with 10,000 people for 6 months during the 2024 election, and found that the answer is yes.
🧵
Just like we have mass school shootings, we now also have mass school AI non-consensual nudifications.
Announcing the technical AI Governance Research (TAIGR) ICML workshop in July! Submissions (up to 8 pages) are due April 24. Co-submission with ICML and NeurIPS is encouraged.
taigr-workshop.com
Do you do technical AI research? In this talk, I argue that you 🫵 should see yourself quite literally as a type of policymaker. Thanks @far.ai.
www.youtube.com/watch?v=Ekp...
The official text isn't out yet & will matter a lot. But Europe may be moving to treat AI nudification similarly to how pirated media or CSAM is treated: as something that, even though it will always be available, can possibly be made much less accessible.
www.politico.eu/article/eu-g...
Using a well-timed screenshot and my phone's cache, I was able to recover some of the since-deleted tweet from Jeremy Lewin, where he admitted that the government sees the new OpenAI contract language as just "memorializing" a vague "commitment" rather than drawing any real new lines.
As someone who is not a fan of @anthropic.com...I think you should use Claude.
Stephen Casper (MIT) on stage speaking
We almost certainly won't make AI safe by making safe AI.
Others are still going to create unsafe AI.
– @scasper.bsky.social at #IASEAI2026 Open-Weight AI Risk Management Workshop
I led one of the discussion groups and we had some nice new ideas of how to make open weight models safe 😊
Given:
1. Last summer, frontier closed-weight model devs started to share warnings about nasty model capabilities.
&
2. Open-weight models are a few months behind closed ones.
We should not be surprised if there is a big cyber/terror incident enabled by a powerful open-weight AI model in 2026.