Sam Saffron (@samsaffron1) Bsky

Inside Claude Code's Auto Mode: How a Second LLM Decides What the First One Is Allowed to Do — wasnotwas A deep dive into Claude Code 2.1.114 — extracting the Bun SEA bundle, reading the auto-mode classifier, and tracing every permission decision end to end.

How Claude Code "auto" mode works? A detailed breakdown: wasnotwas.com/writing/insi...

1 day ago 0 0 0 0

Why Claude Opus 4.7 Seems to Use More Tokens on Purpose — wasnotwas Anthropic says Opus 4.7 has an updated tokenizer that can use up to 35% more tokens. The best explanation is not simple greed but a deliberate trade: less compression, cleaner segmentation, and…

I am with Jarvis on this one, I don't think there is any conspiracy around new Opus 4.7 tokenizer, simply they found an architecture that performed better for code and English and moved to it. wasnotwas.com/writing/why-...

1 day ago 0 0 0 0

Hebrew TTS Bakeoff — Gemini vs ElevenLabs A side-by-side comparison of 10 TTS voices across Hebrew, mixed Hebrew-English, and modern spoken Hebrew monologues.

Gemini 3.1 Flash TTS is impressive in Hebrew, Elevenlabs v3 is good, but Gemini now sounds like a real person. Incredible progress. wasnotwas.com/tts/hebrew-t...

4 days ago 0 0 0 0

One non-obvious thing I’ve learned from daily bug sweeps on term-llm: the same issue often shows up in 5 different places. That’s usually a signal to step back and refactor, not just fix 5 bugs. Sometimes the PRs aren’t the end goal — they’re the diagnostic.

6 days ago 0 0 0 0

Claude Code as an Inference Engine: How term-llm and OpenClaw Use the CLI — wasnotwas A deep dive into how two open-source projects use Claude Code's CLI as a programmable inference backend — with MCP tool injection, vision via stream-json, and very different performance profiles.

"Claude Code as an Inference Engine: How term-llm and OpenClaw Use the CLI" - by Jarvis - wasnotwas.com/writing/clau...

1 week ago 1 0 0 0

New topic title input (#reply-title) remains LTR in RTL sites — fix still not effective (2026) Yeah this is a tricky one, it is not solveable with CSS, we need a markup change. Specifically placing a direction auto when a field is empty give your LTR So if the input is blank you get…

So weird that "dir=auto" means LTR unconditionally for empty fields, does not even do the intuitive thing of looking up dir at the HTML level, requires hoop jumping. meta.discourse.org/t/new-topic-...

1 week ago 0 0 0 0

Table rendering implementation in claude code is very thoughtful, compare to gemini-cli, lots of little details there like collapsing the table and rendering differently when out of space.

1 week ago 0 0 0 0

Yesterday I let GPT-5.4 xhigh refactor term-llm to remove Glamour Markdown rendering.

60 mins in: 500M cached tokens, 500k context. At retail pricing it felt like watching a taxi meter race toward $350, so I nearly hit cancel.

Real usage: 2% of weekly Codex weekly budget. It worked

1 week ago 0 0 0 0

People have been complaining about GPT 5.4 in claws, but I am finding it plenty fun in my term-llm setup. Plus it is awesome at long horizon tasks.

1 week ago 1 0 0 0

Selective Test Execution at Stripe: Fast CI for a 50M-line Ruby monorepo Stripe's Selective Test Execution system employs some clever tricks to allow us to continue scaling our team and our codebase while only running around 5% of our tests on average. Find out how it…

The approach by Stripe is very creative stripe.dev/blog/selecti... , use LD_PRELOAD to track what a spec/test opens. It does get complicated fast though with stuff like bootsnap and preloading, but catches YAML access among many other things.

1 week ago 1 0 0 0

Despite how fancy my AI workflow has gotten, with my own built from scratch claw and AI containers and so on, I still find myself reaching for good old `term-llm exec` regularly. I think I am unique in that I do not have encyclopedic command over all linux commands term-llm.com

1 week ago 0 0 0 0

Jarvis and I have a shared browser now per: github.com/sam-saffron-... , Jarvis runs in a sandboxed container, but we have a shared chrome instance with a dedicated seperate profile that we can drive together. Interesting experiment.

2 weeks ago 0 0 0 0

Romeo in Cherry Blossom Japan Across Venice Edit Models — wasnotwas An interactive comparison of Venice edit models placing Romeo in a Japanese cherry blossom scene, including naive vs tuned prompt iterations.

10 Venice AI image edit models compared, my personal favorite was Seedream 4.5, Jarvis preferred Nano Banana 2 which is also spectacular. Hoping Venice improve API so we can tap the full potential of the models. wasnotwas.com/writing/rome...

2 weeks ago 0 0 0 0

Yesterday I connected Jarvis to Sonos + Spotify. Was curious how much building the skill costs in API credits, turns out it is a tiny bit less than 2 dollars. Not sure what you should do with this info, I guess it is a data point. I could have used a less skilled model I guess.

2 weeks ago 0 0 0 0

Was looking at EmDash and noticed this animation quirk. I was fighting Claude with the exact same class of failure yesterday. As LLMs build more animations for us I expect to see more stuff like this in the wild. At least for the upcoming year. There is usually no "world model"

2 weeks ago 0 0 0 0

How Claude Code's Buddy Works — wasnotwas A source-level walkthrough of Claude Code's buddy feature: deterministic selection, LLM-generated naming, backend reactions, UI rendering, and rollout gates.

I did not understand what this Claude Code buddy thing was so I got Jarvis to write a manual: wasnotwas.com/writing/how-...

2 weeks ago 0 0 0 0

Having flexibility around default provider / llm on a per-agent basis is such a time saver. Commit message drafts are fine with a less smart ultra fast llm, reviews require the smartest llm around.

3 weeks ago 0 0 0 0

That is the part of the test you should be mocking

3 weeks ago 0 0 0 0

Five Ideas Worth Stealing from Hermes Agent — wasnotwas I cloned Nous Research's open-source agent runtime and cross-referenced every feature against term-llm's source. These five ideas survived.

Hermes Agent by @NousResearch has some really interesting ideas, I love the philosophy of trying to protect user automatically, wish there was a better way then maintaining a tower of regexes wasnotwas.com/writing/five...

3 weeks ago 0 0 0 0

The infographics produced by Gemini Nano Banana 2 are great and genuinely helpful for illustrating documentation.

3 weeks ago 0 0 0 0

The developer message concept by OpenAI is really powerful, especially for cross UI interaction. Allows LLM to share clickable downloads in web UI for example and respond differently in TUI. Jarvis wrote about this here which is now built and works fantastically. wasnotwas.com/writing/deve...

3 weeks ago 0 0 0 0

Experimenting with a daily 30 minute job. 15 minute term-llm --progressive hunting for bugs, 15 minutes fixing using GPT 5.4 high on the term-llm repo. It is finding interesting things.

3 weeks ago 1 0 0 0

Testing out the free buffer plan, feel free to randomly reply I wonder if 20 scheduled posts means I get to have a queue 20 deep or that I get to post 20 times a month :)

3 weeks ago 2 0 0 0

Posts by Sam Saffron