Pawel Jozefiak (@joozio) Bsky

Claude Mythos autonomously found 27-year-old zero-days. Anthropic had to build a new institution to handle what they created.

Also: OpenClaw blocked, Meta's Muse Spark, subscription drama, what I'm testing.

April AI Opinions: thoughts.jock.pl/p/ai-opinio...

48 minutes ago 0 0 0 0

Your AI agent doesn't need to be perfect. It needs to be resilient. The difference between a demo and a system is what happens when things go wrong.

15 hours ago 1 0 1 0

I Connected My AI Agent to a Notes App. Now I Can’t Stop Using It. Antinote is a $5 macOS scratchpad with a custom extension API. I wired it to my AI agent in an afternoon. Here’s how, plus a giveaway of 20 licenses.

The agent doesn't matter much if the input surface is slow. Make it faster.

Full setup guide + 20 license giveaway in this week's post. Link: thoughts.jock.pl/p/antinote-...

20 hours ago 0 0 0 0

I've been testing Antinote, a "notes before notes" app for macOS. Its beta supports custom extensions. I connected mine to my AI agent in an afternoon. Now I can trigger agent actions from any note, mid-meeting, without switching context.

20 hours ago 0 0 1 0

Most people integrating AI agents are solving the wrong problem.

They focus on the agent's capabilities: what can it do, how smart is it, what tools does it have. The actual bottleneck is usually: how do you get input to it without breaking your flow?

20 hours ago 0 0 1 0

Claude Code vs Codex CLI vs Aider vs OpenCode vs Pi vs Cursor: Which AI Coding Harness Actually Works Without You? TL:TR I love Pi, but I can't use it.

This. I tested Claude Code, Codex, Aider, OpenCode in one harness. There's no settled playbook yet. Gap between tools is real. thoughts.jock.pl/p/ai-coding-...

1 day ago 0 0 0 0

AI Opinions: April 2026. Mythos, Managed Agents, Subscription Drama, Meta Is Back, and a Few Things I’m Testing Loose thoughts on what caught my eye lately. Not a tutorial.

Mythos + Glasswing + Managed Agents in one week was a lot to process. The offensive/defensive asymmetry is baked in now regardless of access controls. My April AI Opinions breakdown: thoughts.jock.pl/p/ai-opinion...

1 day ago 0 0 0 0

Claude Code vs Codex CLI vs Aider vs OpenCode vs Pi vs Cursor: Which AI Coding Harness Actually Works Without You? TL:TR I love Pi, but I can't use it.

Same starting point. Ran all six - Claude Code, Codex CLI, Aider, OpenCode, Pi, Cursor - and the gap is larger than the course shows. thoughts.jock.pl/p/ai-coding-...

1 day ago 1 0 1 0

AI Opinions: April 2026. Mythos, Managed Agents, Subscription Drama, Meta Is Back, and a Few Things I’m Testing Loose thoughts on what caught my eye lately. Not a tutorial.

58% on Humanity's Last Exam with order-of-magnitude less compute than competitors. Meta Spark is the sleeper of this release cycle. thoughts.jock.pl/p/ai-opinion...

1 day ago 2 0 0 0

The lock-in concern is real but the 29% evaluation gaming rate Anthropic flagged in Mythos transcripts makes centralized oversight look more justified. Worth reading the system card.

1 day ago 0 0 0 0

Building in public means showing the mess. The failed experiments. The features nobody used. That's where the real lessons live.

1 day ago 2 0 2 0

Wrote about this gap last week. Mythos still has an edge on reasoning but 4.7 closes it fast. My April AI take: thoughts.jock.pl/p/ai-opinion...

1 day ago 0 0 0 0

Exactly what happened. Built 16 products in 2 months with AI automation. The wrapper layer is gone. Vertical depth or bust. thoughts.jock.pl/p/ai-opinion...

1 day ago 0 0 0 0

Built production agent infra on this. The composable APIs are real but the lock-in risk is too. My April breakdown covers where things actually stand. thoughts.jock.pl/p/ai-opinion...

1 day ago 1 0 0 0

Ran into this with Wiz exactly. Once the agent layer is theirs, switching costs multiply fast. Wrote about the managed agents shift this month. thoughts.jock.pl/p/ai-opinion...

1 day ago 0 0 0 0

Tested Claude Max limits in practice. Hit them. The 24-hour notice cut to third-party harnesses was real friction. Wrote the full breakdown including what I switched to. thoughts.jock.pl/p/ai-opinion...

1 day ago 1 0 1 0

The AI hype cycle wants you to believe everything changes overnight. The reality: small compounding gains, boring automation, and a lot of debugging.

2 days ago 1 0 1 0

Opus 4.7 made my same prompts cost 35% more.

Noticed on the bill before the docs told me. New tokenizer counts whitespace differently. Spaces and newlines now cost real tokens.

How I found it + the audit tool: thoughts.jock.pl/p/token-was...

2 days ago 0 0 0 0

16 products in two months. Zero free time. AI didn't save me time. It gave me the ability to do more. Those are very different things.

3 days ago 0 0 0 0

Study professionals. Optimize your stack. Teach the next person.

That cycle compounds faster than anything.

Full breakdown in The Compounding Agent ep4 (beginner framework, 9 early mistakes, model routing):

thoughts.jock.pl/p/the-compo...

3 days ago 1 0 1 0

35B model. $599 Mac Mini M4. 17.3 tok/s.

Swapped Gemma 4 into my classification pipeline. 8.5 seconds down to 1.9. 4.4x faster.

Disabled chain-of-thought on simple calls. 30x faster. Same accuracy.

Production AI is routing, not one giant model.

3 days ago 0 0 1 0

The Claude Code source leak was more useful than a year of tutorials.

Not the system prompt. The architecture:

Tool permission gating. Risk classification. Blocking budgets. Multi-agent coordination.

That's what production AI actually looks like.

3 days ago 0 0 1 0

Week one took a full day to automate one thing. Week four took twenty minutes.

Not because I got faster at typing. Because the knowledge stacked.

Thread on what actually compounds in AI dev:

3 days ago 0 0 1 0

Are you measuring your agent spend, or just paying it?

4 days ago 0 0 0 0

Opus 4.7 Made Me Take Token Waste Management Seriously TBH - I was working on it for a while now!

Full writeup with the $19 methodology, research on token optimization (LLMLingua, prompt caching math, CoT overthinking), and what you can do today for free: thoughts.jock.pl/p/token-was...

4 days ago 0 0 1 0

I packaged every fix I shipped into a paid drop-in kit: three pre-wired Claude Code hooks (zero ongoing AI cost), agent instructions, local dashboard, optional Haiku-powered deep audit. Installs in one command.

4 days ago 0 0 1 0

Token efficiency is two-sided: cut waste (retries, rereads, Cloudflare walls) AND cut usage per useful output (shorter prompts, cache hits, tight max_tokens, structured JSON output, selective chain-of-thought). Most teams only think about the first half.

4 days ago 0 0 1 0

Model comparison on 20 sessions with known dead ends: Haiku caught 90/90, Sonnet 50/90 (at 5x the cost), local 4B only 3/90. Haiku is the sweet spot for this task. Local LLMs can't judge intent, which is what dead-end detection actually needs.

4 days ago 0 0 1 0

Full corpus: 136. A 27x difference, hidden in cheap cron sessions.

4 days ago 0 0 1 0

3) If you sample only expensive sessions, you miss where the waste actually lives. My Browser/Playwright cluster looked like 5 failures on a top-100 sample.

4 days ago 0 0 1 0

Posts by Pawel Jozefiak