Two years later, researchers from OpenAI, Anthropic, and Google bypassed every published defense at 90%+ success rates. There is no deterministic fix for prompt injection.
I wrote up what I shipped, what I skipped, and why. Let me know if I missed anything important!
Posts by G/
I wanted to build a lead-catching LLM based chatbot but I had to figure out which defenses were actually worth the complexity.
A Chevrolet chatbot agreed to sell a $76K Tahoe for $1 after one prompt injection.
guillaume.id/blog/defendi...
Quickbit: If you're struggling defining priorities and goals (as a person, a company), take a look at Daniel Miessler's Telos framework. Spent the last hour talking to my laptop to build it! It really helps. github.com/danielmiessl...
I wrote up the full paper breakdown with a practical workflow here: devcenter.upsun.com/posts/agents...
#codingagents #softwareengineering #aiengineering #claude
The agents weren't ignoring the files. They followed every line too literally, treating each instruction as another constraint.
What worked: start empty, watch the agent fail, add one rule when you see the same mistake twice. Five lines that fix real problems.
Your agents.md is probably hurting more than it helps. A new ETH Zurich study tested coding agents across hundreds of real GitHub issues and found that LLM-generated context files reduced success rates by 3% while adding 20% to inference costs.
Really let you think twice of what you can build that can't be easily replicated. And I guess that's where you need to have more human services, support, governance, sovereignty because the product itself may be just a commodity.
#ai #builder #indiedev #claude #anthropic
Many startups started doing this a few months back. It's insane and scary to see how fast Anthropic and OpenAI can replicate a working business and just kill it. Because like Google 10 years ago, there is no reason to have multiple providers when you can just use one instead.
I love initiatives like Claude Code Security (even if it's gated for now). But I'm also terrified of them.
claude.com/solutions/cl...
But transcription quality is comparable, speed might be better on a desktop GPU, and everything stays local. For a free open-source tool it fills a real gap. Full walkthrough with build instructions, GPU setup, and compositor keybindings: guillaume.id/blog/how-i-g...
The accuracy surprised me too. Technical vocab, proper nouns, correct punctuation. Good enough that first drafts need less editing than what I type. Honest take: SuperWhisper has better UX. Visual feedback, mode system, polish. hyprvoice is a daemon you toggle from a hotkey. No GUI, no frills.
Despite the name, it works on any Wayland compositor. I run it on niri. You can point it at cloud APIs or run everything locally with whisper.cpp. With whisper.cpp and the large-v3-turbo model on GPU: sub-100ms transcription latency. Even after a full minute of speech. It felt wrong, it was so fast.
Linux options have been rough. Electron wrappers, X11 hacks that break on Wayland, or shipping your audio to the cloud. Nothing survived daily use. Then I found hyprvoice. Go daemon that captures audio via PipeWire, runs it through a transcription backend, and injects text into your focused window.
Average typing speed: 40 WPM. Conversational speech: 130-150 WPM. Your fingers literally can't keep up with your brain. Voice typing fixes that gap. SuperWhisper nails this on macOS. Press a hotkey, talk, text appears. Accurate, offline, fast. But it's Mac-only with no Linux port planned.
I am not a big fan of OpenClaw (yet) but I find tremendous value in automating simple tasks.
I've worked yesterday and my own terminal based assistant: n6. In honor of the beloved Caprica Six.
Extensible python terminal utility wrapping Claude.
github.com/gmoigneu/n6
And you can obviously inject this into any coding agent.
`npx -y react-doctor@latest .`
#reactjs #javascript #softwareengineering #codequality #opensource
You can even hook it into an agent to fix the flagged issues automatically. If you maintain a React codebase, run it before your next sprint planning. Takes 30 seconds. Output is clean enough to act on immediately.
4 react-doctor is an open-source CLI from Million.co. One command scans your React project for security issues, performance problems, correctness bugs, and architecture smells. Score, file paths, line numbers, actionable diagnostics. Zero config.
I ran react-doctor on a real production codebase. 88 out of 100. 4 errors across 139 files. Better than I expected and the 4 errors are exactly the kind that bite you in production.
I'm starting a new AI Weekly series. I was doing that for my own needs but I think it's worth sharing. If you have any good source to add to my tracker let me know!
www.linkedin.com/pulse/weekly...
My personal take on how to make coding agents reliable.
For me it's all about verification.
The limit isn’t the AI. It’s your infrastructure. Build the verification layer, and the autonomy follows.
The full article: devcenter.upsun.com/posts/making...
By session three, it stops making the same mistakes. By session five, it's catching things before you do. Install it once. It works automatically after that.
github.com/blader/napkin
It's a simple skill that gives the agent a memory. A markdown file in your repo where it writes down what went wrong and what you corrected. Next session, it reads that file first.
You know that moment when Claude forgets you use pnpm? Again. For the third time today. I got tired of repeating myself. And then I found napkin.
If you want the slides from my hashtag#AIDay talk about making coding agents reliable, here you go: guillaume.id/pdfs/2026-02...
And the 8 pillars verification checklist: gist.github.com/gmoigneu/a96...
Thank you Anne-Sophie Norca from dotAI by dotConferences for organizing this!
Refusing to adapt won't preserve safety. It just means the adaptation happens accidentally instead of intentionally. Choose wisely. Full breakdown by Joseph Ruscio: www.heavybit.com/library/arti...
Success = building systems that stay correct WITHOUT requiring someone to comprehend every single line of code.
The orgs investing earliest in new primitives for trust and accountability? They're going to dominate.
What primitives will replace human authorship and review as the foundation of trust?
Teams will track "code reading coverage" metrics. But here's the kicker: they'll strategically drive it DOWN in safe areas.