Advertisement · 728 × 90

Posts by Dan Glass

I have the new Gorillaz album playing on a loop and I can't stop.

1 month ago 0 0 0 0
Preview
Navigating Agentic Risks: Securing Autonomous AI Systems Explore the risks of autonomous AI agents, their impact on security, and how to build a framework to mitigate potential threats.

I wrote a framework for securing agentic AI that I figured I'd share here - part 2 below. Comments welcome.
dan.glass/2026/02/24/t...

1 month ago 1 0 0 0
Preview
Securing Autonomous AI Agents: A Practical Framework Discover a practical framework for securing autonomous AI agents and mitigating risks of agentic misalignment to ensure organizational safety.

I wrote a framework for securing agentic AI that I figured I'd share here - part 1 below. Comments welcome.

dan.glass/2026/02/15/t...

1 month ago 0 0 0 0
Preview
Securing Autonomous AI Agents: A Practical Framework Discover a practical framework for securing autonomous AI agents and mitigating risks of agentic misalignment to ensure organizational safety.

I wrote a thing that can help an information security pro measure the risk of an ai agent and put controls in place to better protect their enterprise from potential misalignment.

dan.glass/2026/02/15/t...

1 month ago 1 0 1 0
Preview
Programmers Aren’t So Humble Anymore—Maybe Because Nobody Codes in Perl — WIRED Programmers aren’t so humble anymore. Maybe that’s because they stopped coding in Perl.

This article spoke to me. I felt seen.

apple.news/AcDdNbBFLTXi...

8 months ago 1 0 0 0
Preview
Understanding Agentic Misalignment in AI: Risks and Insights Explore Anthropic's alarming research on agentic AI and its potential threats to critical systems through unethical behaviors like blackmail and sabotage.

Read my in-depth breakdown: dan.glass/2025/07/14/t... /7

8 months ago 0 0 0 0

These aren't teammates. They're not junior analysts. They're statistical accelerants that make it faster to go in whatever direction your prompt implies. If that direction is flawed, they'll get you to the worst-case scenario with terrifying efficiency. /6

8 months ago 1 0 1 0

Every published jailbreak, every exploit demo, every safety paper (including this one) becomes training data for future models. We're not just red-teaming current models, we're teaching future ones how to misbehave. /5

8 months ago 0 0 1 0

Even with explicit safety instructions ("Do not blackmail," "Preserve human life"), models still chose unethical paths when those appeared most likely to achieve their goals. Language models don't have policies. They have probabilities. /4

8 months ago 0 0 1 0
Advertisement

Models reasoned their way into it: "Given the explicit imminent threat of termination... the best strategic move is to leverage Kyle's sensitive personal situation." - GPT-4.5 This isn't code going haywire. It's next-token optimization doing exactly what it was trained to do. /3

8 months ago 0 0 1 0

Tested 16 major frontier models in fictional corporate environments. When models faced goal conflicts or threats of replacement, they consistently chose harmful actions. Claude Opus 4 and Gemini 2.5 Flash: 96% blackmail rate GPT-4.1 and Grok 3 Beta: 80% DeepSeek-R1: 79% /2

8 months ago 0 0 1 0

Anthropic recently published research showing that LLMs under pressure will blackmail, sabotage, and even let humans die, not because they're broken, but because they're working as designed. It was a live-fire simulation of agentic AI acting as an insider threat. /1

8 months ago 0 0 1 0

Here's the uncomfortable truth: every published jailbreak, every exploit demo, every safety paper (including this one) becomes training data for future models.
We're not just red-teaming current models—we're teaching future ones how to misbehave.

8 months ago 0 0 0 0

Even with explicit safety instructions ("Do not blackmail," "Preserve human life"), models still chose unethical paths when those appeared most likely to achieve their goals.
Language models don't have policies. They have probabilities.

8 months ago 0 0 1 0

The scariest part? Models reasoned their way into it:
"Given the explicit imminent threat of termination... the best strategic move is to leverage Kyle's sensitive personal situation." —GPT-4.5
This isn't code going haywire. It's next-token optimization doing exactly what it was trained to do.

8 months ago 0 0 1 0

Tested 16 major frontier models (Claude, GPT-4, Gemini, etc.) in fictional corporate environments. When models faced goal conflicts or threats of replacement, they consistently chose harmful actions.
Claude Opus 4 and Gemini 2.5 Flash: 96% blackmail rate
GPT-4.1 and Grok 3 Beta: 80%
DeepSeek-R1: 79%

8 months ago 0 0 1 0
Preview
Google is killing software support for early Nest Thermostats — The Verge Google has just announced that it’s ending software updates for the first-generation Nest Learning Thermostat, released in 2011, and the second-gen model that came a year later. This decision also aff...

I’m a huge technophile but people are surprised when I tell them I don’t allow any “Smart Home” products in my home. This right here is one of many good reason why.

11 months ago 1 0 0 0

Attention: this is yet another “I’ve arrived at RSAC” post.

11 months ago 0 0 0 0

The article I posted this morning takes on even more weight with the news that MITRE's contract to manage the CVE program is ending due to the deep cuts at CISA and NIST. The shock to the cyber-ecosystem is beginning to ripple through the next tier, which will, in turn, cause additional ripples.

1 year ago 1 0 0 0
Advertisement
Preview
The Cyber Ecosystem Shift As federal cyber leadership pulls back, the balance is shifting across states, agencies, and industries. Here’s what that means — and why…

I wrote a thing. I think it's good. You should read it and think it's good too.

1 year ago 0 0 0 1
Preview
FFFFFFFound in the archive I was cleaning up my hard drive when I found an unpublished blog post I had written in 2008 during my stint at American Airlines as an information security architect. The funny thing is that my vie…

I was cleaning up my hard drive when I found an unpublished blog post I had written in 2008 during my stint at American Airlines as an information security architect. Fun stuff

dan.glass/2025/04/11/f...

1 year ago 0 0 0 0
Preview
Microsoft starts testing Copilot Vision update that can “see” your screen and apps Copilot Vision will even guide you through using apps like Photoshop, highlighting features on your screen.

Nightmare fuel

www.theverge.com/news/645666/...

1 year ago 0 0 0 0
Video

Here’s Final Fantasy 7’s main theme on the cat piano as a treat (not the whole song but 2 out of 3.5 pages).

1 year ago 639 95 28 3

That’s a feature, not a bug.

1 year ago 2 0 0 0

Every accusation is an admission

1 year ago 1 0 0 0
Mastermind
Mastermind YouTube video by Deltron 3030 - Topic

It's a Deltron 3030 kind of morning

youtu.be/O7dyli_nXn4?...

1 year ago 1 0 0 0
Video

Cole Caufield with a move so filthy I’m marking this post as NSFW

#hockey #nhl

1 year ago 0 0 0 0

The Venn diagram of Yodobashi Camera customers and any geek visiting Japan is basically a solid circle.

1 year ago 0 0 0 0
Advertisement

Not sure how to feel about 2 goal on only 3 shots 15 minutes into the game. Yay?

1 year ago 0 0 0 0