💯
Posts by Tom Johnson
Coding agents take bad metadata at face value. They won’t reconcile five different naming conventions for the same entity.
This is a fundamental and under appreciated asymmetry: humans accumulate skepticism as a kind of professional immune system. AI agents don’t.
AI has no intuition for “that’s sus”. AI agents have none of that scar tissue that experienced developers accumulate over time.
Exactly. And with (lack of) flossing, the only person who suffers is you. With data quality, it's the whole team plus whatever agent you've handed operator-level trust to.
Once the signal becomes unreliable, people stop responding to it.
The same dynamic is playing out with AI-generated PRs.
The Law of False Alerts: “As the rate of erroneous alerts increases, operator reliance, or belief, in subsequent warnings decreases.”
Too many alerts and people stop reading them. Too many false positives and people stop trusting them.
Traditional observability tools were built to answer a fundamentally different question than debugging agents: is the system healthy?
AI debugging agents need to answer: what exactly happened, where did it break down and what is the fix?
Everyone in the room nods when you say “garbage in, garbage out.” It’s the first thing developers say when you ask them about AI and data quality.
It’s also, apparently, the last thing they think about when they connect their observability stack to a debugging agent.
What the tooling the community has built is extraordinary. But the gap between what telemetry shows you and what you actually need to debug AI-generated code in production is growing faster than the tools are.
Check out my talk: sched.co/2HJVo
Observability Summit talk "from data dumps to smart context" by Thomas Johnson, Multiplayer CTO and cofounder
Hot take for a room full of observability practitioners:
Logs, traces, and metrics were designed for a world where humans wrote and reviewed every line of code. That world is gone.
I'm speaking at @cncf.io's Observability Summit North America in Minneapolis on May 21–22.
Turns out “is that a problem?” has a pretty clear answer.
I'm joining this panel 👇 today and I'll be bringing a slightly uncomfortable perspective. 👀
Looking forward to an honest conversation with Jon Haddad, Michele Mancioppi, and Amy Tobey about what it takes to make agentic observability actually work.
📅 Today, Wed, 18 MAR 2026
🕘 10:00 AM PDT | 1:00 PM EDT | 6:00 PM CET
A panel on agentic observability with @tomjohnson3.bsky.social, @rustyrazorblade.com orblade.com, @michele.dash0.com, and @renice.bsky.social Hosted by @dash0.com and @leaddev.com
Save your spot: leaddev.com/event/a-blue...
More of my thoughts on this topic: beyondruntime.substack.com/p/system-fir...
System-first o11y has its function, but more and more devs are realizing that effective debugging requires a session-first approach with a single timeline that correlates:
• User interactions
• Console errors
• Network req/res
• Backend traces and spans
The productivity narrative around AI coding tools is real but incomplete.
Lines of code shipped is the wrong metric to consider when you may end up losing your efficiency gains because you have to stop and fix a ton of bugs.
The new bottleneck is knowing what your code actually does.
This chart should scare you👇. 96% of developers don't fully trust AI-generated code, yet only 48% always review it before it ships.
Sit with that for a second.
First time speaking at a @cncf.io event. If you're going to be in Minneapolis, let's connect. ☕️👇
My talk will be all about the hard lessons from moving an MCP server into production and how to design for AI agents and effective debugging.
Full schedule: observabilitysummitna26.sched.com
More of my thoughts on this topic here: [beyondruntime.substack.com/p/spec-driven-developmen...
If your team isn’t discussing Spec-Driven Development or how your design decisions are documented, versioned, and shared, you’re undermining your AI tooling strategy.
AI agents (especially when running in parallel) don’t operate well on vague intent. They need precise, well-reasoned specifications to make consistent decisions.
Most engineers tune out when they hear “system design.”
It feels like overhead.
Until AI forces you to care about it.
If your data is scattered across tools, aggressively sampled, and missing payloads … AI can't magically correlate it for you.
You're automating a broken workflow.
Full article: [beyondruntime.substack.com/p/a-major-incident-will-...
Why is this worrisome? Because the latest State of Code Developer Survey from Sonar reports that AI-generated or significantly AI-assisted will jump to 65% by 2027.
This effectively means that very soon (if not already) a major incident will be traced back to an AI coding tool.
So the net effect of AI-assisted development is that we’ve offloaded the part developers are generally comfortable with (writing code), and left them with the part that’s harder (system design, reviews, debugging, etc.), but without the context built naturally by doing the writing themselves.
Reading and understanding someone else’s code is significantly harder than writing code yourself. AI-generated code is, in every meaningful sense, someone else’s code.
Adding AI to legacy observability practices won't make debugging faster.
It'll just amplify the problem.
The talk covers the modern telemetry data problem, why most MCP implementations inherit broken observability practices, and the path to self-healing systems that can actually act on the right data.
Full agenda: leaddev.com/leaddev-lond...