The "testing mindset" isn't a personality trait. It's trained. Developers often don't think about quality the way QA engineers do because they haven't had the same deliberate practice. That's fixable, but only if the culture reinforces it too.
Posts by Kato Coaching
Before you conclude AI isn't worth it, measure how long correction actually takes. One experiment, a real task, less than a day.
Free guide: resources.kato-coaching.com/5-ai-experiments
#SoftwareTesting #QA #AIinTesting
The Easter Bunny has 10,000 eggs to hide across 47 gardens before sunrise. His team suggests using AI to plan the hiding spots.
Before handing a task to AI, run it through a simple framework. Here is how the Easter Bunny uses this framework to avoid a disaster. 🐇🥚
Getting the task type wrong makes the model choice irrelevant. The scorecard tells you what kind of solution actually fits before you commit to one.
https://resources.kato-coaching.com/scorecard
#SoftwareTesting #AItools #QA
TIL: it's much easier to get Claude Code to play nicely with the Google workspace MCP than it is to get Gemini to use it. What the hell? Yes, I am experimenting with a multiple agent setup, but man, it's painful.
AI gave you 12 scenarios. Now what? The list is raw material, not a test plan, and your judgment is what turns it into one. New video on how to triage what AI gives you: https://youtu.be/0awmamL1l2c
#softwaretesting #AIinTesting
☝️☝️☝️
Agile gave us JIRA. DevOps gave us pipelines. Neither answered the harder question. AI evals is doing the same thing: the tooling is excellent. The question underneath it remains.
kato-coaching.com/the-hard-part-of-ai-eval...
#AITesting #QualityEngineering #SoftwareTesting
Most AI pilots produce impressions, not evidence. An acceptable error rate isn't a feeling. It's a number you write down before the experiment starts.
https://resources.kato-coaching.com/scorecard
#SoftwareTesting #AItools #QA
"The profession keeps arriving at “quality is systemic” as though it’s a fresh revelation, every time a new technology makes the verification frame obviously inadequate. It happened with continuous delivery. [...] It is happening now with AI agents."
Every AI demo works because the vendor picked the use case. When you're back at your own codebase, the conditions are different. Define your problem before you evaluate the tool.
https://resources.kato-coaching.com/scorecard
#SoftwareTesting #AItools #QA
The vague prompt gets you a generic list. Two minutes of context - scope, users, risk angle - gets you scenarios worth thinking about. Short video on how to close that gap.
https://youtu.be/55SLTvKq7rE
#softwaretesting #AIinTesting
The AI evals field calls it a rubric. If you've done serious testing work, you probably know it by a different name.
https://kato-coaching.com/the-anatomy-of-a-metric/
#SoftwareTesting #AITesting
When AI adoption is handed to IT, you get a deployment. Deciding what good output looks like and how to verify it belongs to the people doing the work.
resources.kato-coaching.com/scorecard
#QualityEngineering #TechLeadership
When teams say the AI tool isn't working, I ask about the process before the AI arrived. Usually the requirements were always vague. The tool is just making that visible.
https://resources.kato-coaching.com/scorecard
#SoftwareTesting #AIinTesting #QA
Six months into an AI pilot with no success criteria is just six months of accumulated impressions.
You can't pass or fail a hope.
I am working on a food tracker for my very specific needs, and Claude just turned into every developer I ever worked with.
"The hard part is scalability, not automation." That line from session one of "AI evals and analytics" confused me. Session two explained it.
Full write-up in my blog: kato-coaching.com/the-ai-evals-field-chose...
#AIEvals #SoftwareTesting #QA
Updated my website this week — it should finally be clear what I do and how to work with me. Courses, workshops, and 1:1 coaching for QA professionals.
kato-coaching.com
If the output is slop regardless of how you phrase it, the problem isn't the prompt. It's the use case.
Free scorecard:
Before I wrote today's post, I defined what good looked like: gives value, doesn't rely on outrage, sounds like me. Writing to a clear brief changes the experience. So does diagnosing a draft. Testers do this before they run anything.
The AI correction loop usually starts before the tool is opened, at the moment someone chose the wrong use case for it. Testers already know how to ask whether a tool suits a problem. That skill just hasn't been applied here yet. More soon.
The skills QA professionals already have (defining success criteria, testing behaviour, not trusting metrics at face value) are exactly what's missing from most AI integrations. I'm learning AI evals to understand why.
Session one: kato-coaching.com/what-i-dont-understand-about-ai-evals-yet/
Ran my workshop "Deciding Fast" on Tuesday with a software team in Sweden. Everyone in one room sharing computers, no breakouts. It's built for remote. I adapted. 6/8 Good or Excellent. Best response to "what will you do differently?": "Set clearer success condition."
#SoftwareTesting #AITesting
"Tell me what you're uncertain about." "Push back if my assumptions are wrong." Only 30% of people give instructions like these. The model defaults to confident and agreeable.
https://www.anthropic.com/research/AI-fluency-index
#SoftwareTesting #AITesting #AILiteracy
The strongest predictor of AI fluency, per Anthropic's research: iteration. Treating the first response as a draft, not an answer. 5.6x more likely to question reasoning. Familiar territory if you work in testing.
https://www.anthropic.com/research/AI-fluency-index
#SoftwareTesting #AITesting
Sounds really interesting! I hope you can share outside of that conference presentation, I’d love to hear more when you have it.
That’s a great approach. I presume you don’t want to spoil your punchline by telling us how it’s going?
Six months in and nobody can say whether the AI is actually working. Not anecdotally, but with evidence. That gap is the most common thing I see in QA teams right now.
How do you measure if the new licence is worth the money?
#softwaretesting #QA #AItools
18% of testers I surveyed said their top AI frustration is not bad output. It is that the tools have no sense of test strategy.
The AI is not wrong. It is indiscriminate.
#AITesting #SoftwareTesting #TestStrategy