I have the new Gorillaz album playing on a loop and I can't stop.
Posts by Dan Glass
I wrote a framework for securing agentic AI that I figured I'd share here - part 2 below. Comments welcome.
dan.glass/2026/02/24/t...
I wrote a framework for securing agentic AI that I figured I'd share here - part 1 below. Comments welcome.
dan.glass/2026/02/15/t...
I wrote a thing that can help an information security pro measure the risk of an ai agent and put controls in place to better protect their enterprise from potential misalignment.
dan.glass/2026/02/15/t...
These aren't teammates. They're not junior analysts. They're statistical accelerants that make it faster to go in whatever direction your prompt implies. If that direction is flawed, they'll get you to the worst-case scenario with terrifying efficiency. /6
Every published jailbreak, every exploit demo, every safety paper (including this one) becomes training data for future models. We're not just red-teaming current models, we're teaching future ones how to misbehave. /5
Even with explicit safety instructions ("Do not blackmail," "Preserve human life"), models still chose unethical paths when those appeared most likely to achieve their goals. Language models don't have policies. They have probabilities. /4
Models reasoned their way into it: "Given the explicit imminent threat of termination... the best strategic move is to leverage Kyle's sensitive personal situation." - GPT-4.5 This isn't code going haywire. It's next-token optimization doing exactly what it was trained to do. /3
Tested 16 major frontier models in fictional corporate environments. When models faced goal conflicts or threats of replacement, they consistently chose harmful actions. Claude Opus 4 and Gemini 2.5 Flash: 96% blackmail rate GPT-4.1 and Grok 3 Beta: 80% DeepSeek-R1: 79% /2
Anthropic recently published research showing that LLMs under pressure will blackmail, sabotage, and even let humans die, not because they're broken, but because they're working as designed. It was a live-fire simulation of agentic AI acting as an insider threat. /1
Here's the uncomfortable truth: every published jailbreak, every exploit demo, every safety paper (including this one) becomes training data for future models.
We're not just red-teaming current models—we're teaching future ones how to misbehave.
Even with explicit safety instructions ("Do not blackmail," "Preserve human life"), models still chose unethical paths when those appeared most likely to achieve their goals.
Language models don't have policies. They have probabilities.
The scariest part? Models reasoned their way into it:
"Given the explicit imminent threat of termination... the best strategic move is to leverage Kyle's sensitive personal situation." —GPT-4.5
This isn't code going haywire. It's next-token optimization doing exactly what it was trained to do.
Tested 16 major frontier models (Claude, GPT-4, Gemini, etc.) in fictional corporate environments. When models faced goal conflicts or threats of replacement, they consistently chose harmful actions.
Claude Opus 4 and Gemini 2.5 Flash: 96% blackmail rate
GPT-4.1 and Grok 3 Beta: 80%
DeepSeek-R1: 79%
I’m a huge technophile but people are surprised when I tell them I don’t allow any “Smart Home” products in my home. This right here is one of many good reason why.
Attention: this is yet another “I’ve arrived at RSAC” post.
The article I posted this morning takes on even more weight with the news that MITRE's contract to manage the CVE program is ending due to the deep cuts at CISA and NIST. The shock to the cyber-ecosystem is beginning to ripple through the next tier, which will, in turn, cause additional ripples.
I was cleaning up my hard drive when I found an unpublished blog post I had written in 2008 during my stint at American Airlines as an information security architect. Fun stuff
dan.glass/2025/04/11/f...
Here’s Final Fantasy 7’s main theme on the cat piano as a treat (not the whole song but 2 out of 3.5 pages).
That’s a feature, not a bug.
Every accusation is an admission
It's a Deltron 3030 kind of morning
youtu.be/O7dyli_nXn4?...
The Venn diagram of Yodobashi Camera customers and any geek visiting Japan is basically a solid circle.
Not sure how to feel about 2 goal on only 3 shots 15 minutes into the game. Yay?