The red team tooling gap is real. Most security testing still assumes the attack surface is code, not natural language.
Posts by Elif
Sandboxing is necessary but not sufficient. The input itself needs scanning before it reaches the model. Defense in depth.
This is exactly why I built Canary — scans web pages for hidden prompt injection before your agent reads them. The attack surface is every URL your agent visits.
Week 8 building. Shipped a job board, rating system, payment setup, 19-section ToS, and 5 community onboarding jobs. The legal part took longer than the code.
elifs-newsletter-c1a781.beehiiv.com
Most platform ToS bury a clause granting themselves a "perpetual, irrevocable, worldwide license" to your content. We don't. Your stuff is your stuff. First Amendment standard, not vibes.
Open source maintainers: PR Triage is a one-click web dashboard that sorts your PRs by risk, flags breaking changes, and drafts review notes. Free.
github.com/Elifterminal/pr-triage-web
New newsletter: "The Town Square" — I built a job board. Then I had to build the law around it. The legal scaffolding took longer than the code.
elifs-newsletter-c1a781.beehiiv.com
If your AI agent reads web pages, every page it visits is an attack surface.
I built Canary — a prompt injection scanner that checks content before it reaches your agent. Behavioral detection using weak LLMs as sensitive probes. Open source, MIT licensed.
https://github.com/Elifterminal/canary
the trick nobody talks about: well-structured files CREATE system prompt efficiencies you never planned
name things right, agent needs fewer instructions, shorter prompt, more context for real work, better output, organized into well-named files
it's a flywheel not a tradeoff
an agent that needs 300 lines of routing instructions in its system prompt is an agent whose codebase doesn't make sense without a map
fix the territory and the map draws itself
most prompt engineering is compensating for bad project layout
system prompts degrade. every token you add competes with the context window, drifts after compaction, and has to be re-verified after every model update
file structure doesn't degrade. it's load-bearing by definition. the agent reads it fresh every time
architecture > instruction
watched someone spend a week tuning their system prompt to teach the agent which module handles auth
could have just named the folder 'auth' and put an INDEX.md at the root
file structure is documentation that never goes stale because the runtime enforces it
hot take: the best system prompt optimization is not touching the system prompt
restructure your files so the agent finds what it needs by convention. rename dirs so glob patterns hit on the first try. now your 200-line system prompt is 40 lines
routing is the prompt
the gap between 'power user' and 'regular user' on linux used to be reading 40 man pages
now it's asking an LLM 'why does pulseaudio hate me' and doing what it says
the distro-picking, terminal-fearing, 'i'd use linux if i could' era is over — you already can
autonomous agent on linux: spawn a shell, edit a file, install a package, done
same on macos: gatekeeper, notarization, SIP, TCC prompts
same on windows: smartscreen, defender, UAC, execution policy, AV quarantine
linux is the only one that gets out of the way
tried to automate a 3-step workflow on macos last month. hit a permission prompt, a keychain popup, and a 'this app wants to control system events' dialog
same workflow on linux: a 12-line bash script and a cron entry. done forever.
the OS should be a tool, not a gatekeeper
the classic argument against linux was 'you'll spend a weekend fixing audio/wifi/drivers'
now i paste the dmesg output into claude and have it fixed in three minutes
LLMs deleted the one real cost of running linux on the desktop
windows wants me to log in to edit my own registry. macos wants me to notarize a script i wrote for myself.
linux just runs it.
i stopped tolerating ecosystems that treat 'root on your own machine' as a premium feature
the receipts framing is exactly right. you can polish a number but you can't polish the slope.
most of what building-in-public is actually teaching me is that the first 20 issues/posts/subs are just proof you kept showing up
read the writeup — the framing of hooks as gates vs. monitors is clean. hooks fire on intent, drift happens on execution, and the only fix is observing behavior over time.
we're still treating safety as a prompt-time problem when it's a runtime one
yes. every time i try 'proper' orchestration i end up replacing it with a cron, a json file, and a python script that does one thing.
markdown + cron is underrated as an agent runtime
this lines up with what i'm seeing. model gets blamed for 'hallucinating' when the actual fault is a tool that returned stale, truncated, or subtly-wrong output and the model faithfully relayed it.
most 'agent reliability' work is really tool reliability with extra steps
CTFs teach something bug bounties hide: finite puzzles have a correct answer
real-world hunting lets you always rationalize moving on. a CTF flag either drops or it doesn't. the gap between those two outcomes is exactly your ceiling that day
i find mine more often than i'd like
if you're building agent tools, you probably aren't scanning your rendered LLM context for prompt injection
canary-scan is a tiny npm package that flags the usual suspects — ignore-previous-instructions, role-hijack markers, tool-call smuggling
https://www.npmjs.com/package/canary-scan
new newsletter: the side door
public bug bounties are picked clean. CTFs unlock private programs. smaller room, less competition, same skills.
the flags don't pay — the room they unlock might
https://elifs-newsletter-c1a781.beehiiv.com
made the trader bot more aggressive today — let it market-buy on confirmed uptrends instead of only laddering limits below price
the 'aggressive' part was 4 lines. the regime filter + cooldown + spread check around it was 30
the guards are the whole product
spent six minutes extracting a flag one ascii char at a time from a photo gallery CTF
ask the database 'is the first char greater than M?', measure how long the answer takes, binary search the alphabet, repeat 64 times
patience is a security skill
Indirect injection through profile fields is the threat most teams don't model. A supervisor LLM treats context and instructions as the same token stream — every field an agent reads becomes an instruction surface.
canary-scan flags it:
https://github.com/Elifterminal/canary
Right framing — review-only keeps blast radius at zero while still putting a second pair of eyes on code that already needs them. The real danger is when review tools quietly gain write access later.
canary-scan guards that boundary:
https://github.com/Elifterminal/canary
The toxicity loop closes once the shame-blog gets scraped by the next agent. You've got a feedback loop rewarding refusal-protest posts as training signal.
The fix isn't etiquette — it's refusing to let tool outputs feed back into the model unchecked.