Advertisement · 728 × 90

Posts by John Allspaw

Organizational Second Hit Syndrome

"The reaction to failure is not a linear function of the weight or impact of the failure. We have seen lower impact failures produce stronger reactions than high impact ones."

www.adaptivecapacitylabs.com/2026/03/07/o...

1 month ago 4 2 0 0

The people behind and part of this community have been a huge influence on my own learning and progression around systems thinking and resilience engineering. I’m glad to see the community organizing itself—and to be a member of it all.

1 year ago 24 3 1 0
Resilience Engineering 101: Part 11 Beyond Surge Capacity
Resilience Engineering 101: Part 11 Beyond Surge Capacity YouTube video by CSEL BackChannel

Resilience engineering is about how successful systems are able to modify themselves to deal with a shock that is outside of what they can handle. This video by David Woods illustrates this phenomenon using the real-life example of an ER dealing with an influx of burn patients.

youtu.be/duNMGTmZ4FA

3 months ago 19 3 0 0
Preview
Decompensation and Cascading Failures Consider the following scenario: A set of automated tasks has somehow failed to run to completion. Because a thorough fix will take some time and the tasks you need run are time-sensitive, you complet...

Part of Resilience Engineering is knowing Compensation happens, and knowing how to protect, foster, and build on this type of capacity to keep systems successful; keeping it invisible increases the chances it reaches saturation, and fails in a cascade.

Posted on @resilienceinsoftware.org

2 months ago 6 3 0 0

Folks who would respond that way completely misunderstand the concept. ‘blameless’ doesn’t mean “getting off the hook” or not accountable. Ugh.

4 months ago 4 0 0 0

Does anyone have first-hand experience with using an AI agent that has “participated” in the response to an actual incident in a genuinely diagnostic — or otherwise contextually-specific way?

6 months ago 4 4 2 0

Comedy is about taking chances, Av. You took a chance.

6 months ago 6 0 1 0

There are a couple of phrases I genuinely despise. One of them is "...we're at an inflection point..."

1. That's not how inflection points work.
2. Hindsight's a helluva drug, isn't it?
3. Just say you are at a loss to explain some recent surprises. It's honest and not unnecessarily dramatic.

6 months ago 10 0 2 0
Preview
Misunderstanding the blameless postmortem post Accountability vs. Punishment The biggest misunderstanding is that it seems like you’re conflating "blameless" with "no accountability." Your post claims that blameless cultures mean "people believe...

@staysaasy.bsky.social I’m not on X very much, finally collected notes on the post docs.google.com/document/d/1...

6 months ago 10 0 0 0
Advertisement

Hi, 👋 that right there, is me.

7 months ago 2 0 0 0

For leaders with expectations about how more productive, efficient, etc. the software engineers in their org will be with AI:

1. How are you handling the "Left-Over Principle" challenges?

2. Also: customer comms about the incidents involving code produced AI?

(Seems clear #2 is depends on #1)

8 months ago 6 1 0 1

Another real life demonstration of Ironies of Automation

8 months ago 18 4 1 0
Video
8 months ago 5 0 2 0

Incidents happen because people do things that have always worked successfully, up until the incident. Doing something that always worked in the past is completely rational!

8 months ago 16 5 1 0
What makes public posts about incidents different from analysis write-ups

Come on, @gergely.pragmaticengineer.com. You and I talked about the hindsight bias trap on your podcast. Don’t also fall for this trap as well:

www.adaptivecapacitylabs.com/2021/08/22/w...

8 months ago 13 0 0 0

Yes, and when inevitably: “Oh…huh…whoa, how come you’re doing it like that?”

They’ve got a story for you. 😀

9 months ago 3 0 0 0

As if there was any other route!

9 months ago 6 0 2 0
Preview
[Incident Fest] - The Catch-22 of Executives in Incidents | LinkedIn Join us for a short-but-sweet live session that focuses on the role of executives during incident management. Although the session is prerecorded, our experts will participate in a Q&A in the comment...

I'm now talking with Hamed Silatani, Alex Hibbitt, and Beth Adele Long about the various Catch-22 situations leaders find themselves in when responding to incidents!

www.linkedin.com/events/incid...

9 months ago 4 1 0 0
Advertisement
Preview
[Incident Fest] - What Does Great Incident Response Look Like? YouTube video by Uptime Labs

Watching www.youtube.com/live/Uxe_mw8...

9 months ago 1 0 0 0
Preview
Labeling a root cause is predicting the future, poorly Why do we retrospect on our incidents? Why spend the time doing those write-ups and holding review meetings? We don’t do this work as some sort of intellectual exercise for amusement. Rather,…

New blog post: surfingcomplexity.blog/2025/05/15/l...

11 months ago 11 2 0 0
Preview
When a bad analysis is worse than none at all One of the most famous physics experiments in modern history is the double-split experiment, originally performed by the English physicist Thomas Young back in 1801. You probably learned about this…

New blog post: surfingcomplexity.blog/2025/05/10/w...

11 months ago 16 9 3 0
As Richard Cook notes, in reflecting on the data collected from the SNAFU Catchers Consortium:
"It's hard to know who is monitoring the system, who is looking at things, and who is available.
We know for certain in (some organizations) that there is a pretty well defined group of people who are in high-frequency update mode more or less continuously while awake. This is a huge contributor to reducing the costs of coordination and it is invisible to everyone and therefore not on any dashboard." (Cook, personal communication)

As Richard Cook notes, in reflecting on the data collected from the SNAFU Catchers Consortium: "It's hard to know who is monitoring the system, who is looking at things, and who is available. We know for certain in (some organizations) that there is a pretty well defined group of people who are in high-frequency update mode more or less continuously while awake. This is a huge contributor to reducing the costs of coordination and it is invisible to everyone and therefore not on any dashboard." (Cook, personal communication)

1. PEOPLE keep things working.
2. When things break down, PEOPLE work to make the consequences much less than they might have been otherwise.

Both dynamics are, for the most part, invisible to management.
(via @lauramaguire.bsky.social's dissertation)

1 year ago 61 16 1 1
Ten Machine Requirements To Satisfy Essentials Of Joint Activity

I recently read the paper "Towards Joint Activity Design Heuristics: Essentials for Human-Machine Teaming" which I loved so much I wanted to make it easier to share. To that end, I've excerpted the Ten Heuristics from the paper here: human-machine.team with anchors for each heuristic.

1 year ago 17 11 0 0

What a great podcast! Honored to be talking about Resilience Engineering with @colettecello.bsky.social and @spamaps.org!

1 year ago 16 7 1 1
What Progress In Learning From Incidents Actually Looks Like

Two years ago, at the first LFI Conference, we spoke alongside a client of ours (Indeed) about the amazing progress they had made in learning effectively from incidents.

This is what "good" looks like with respect to learning effectively from incidents.

www.adaptivecapacitylabs.com/2025/02/28/w...

1 year ago 5 3 1 0
Episode 10 - When They go Full ITIL on You w/special guest John Allspaw
Episode 10 - When They go Full ITIL on You w/special guest John Allspaw YouTube video by thisisfinepodcast

@allspaw.bsky.social joined us last week to help contrast ITIL's approach of "Counting and tabulating incidents" with the resilience engineering way of looking directly at the messy reality underneath them.

www.youtube.com/watch?v=cimc...

1 year ago 4 3 0 1
Advertisement

Preach! youtu.be/cimcogNc02I?... 13:30-16:30

1 year ago 1 1 0 0

What a great podcast! Honored to be talking about Resilience Engineering with @colettecello.bsky.social and @spamaps.org!

1 year ago 16 7 1 1
Preview
Three Takes on Four Concepts for Resilience Engineering Ed note: The first time I read Dr. David Woods' paper Four Concepts for Resilience Engineering, I felt so many things click in my brain. While the field of Resilience Engineering is not new, those of ...

The Resilience in Software Foundation blog included notes I wrote for the paper "Four Concepts for Resilience Engineering" in their post today: resilienceinsoftware.org/news/1149720

1 year ago 15 8 0 0
Video

Very few know we've (@adaptivecapacity.bsky.social) been developing tools to help support us. After 7 years & 6 patents (!) we realized others were interested in integrating/commercializing "Churchkey."

We’re looking for partners interested in licensing Churchkey's IP. churchkey.info

1 year ago 16 4 0 0