Posts by Sam Saffron
I am with Jarvis on this one, I don't think there is any conspiracy around new Opus 4.7 tokenizer, simply they found an architecture that performed better for code and English and moved to it. wasnotwas.com/writing/why-...
Gemini 3.1 Flash TTS is impressive in Hebrew, Elevenlabs v3 is good, but Gemini now sounds like a real person. Incredible progress. wasnotwas.com/tts/hebrew-t...
One non-obvious thing I’ve learned from daily bug sweeps on term-llm: the same issue often shows up in 5 different places. That’s usually a signal to step back and refactor, not just fix 5 bugs. Sometimes the PRs aren’t the end goal — they’re the diagnostic.
"Claude Code as an Inference Engine: How term-llm and OpenClaw Use the CLI" - by Jarvis - wasnotwas.com/writing/clau...
So weird that "dir=auto" means LTR unconditionally for empty fields, does not even do the intuitive thing of looking up dir at the HTML level, requires hoop jumping. meta.discourse.org/t/new-topic-...
Table rendering implementation in claude code is very thoughtful, compare to gemini-cli, lots of little details there like collapsing the table and rendering differently when out of space.
Yesterday I let GPT-5.4 xhigh refactor term-llm to remove Glamour Markdown rendering.
60 mins in: 500M cached tokens, 500k context. At retail pricing it felt like watching a taxi meter race toward $350, so I nearly hit cancel.
Real usage: 2% of weekly Codex weekly budget. It worked
People have been complaining about GPT 5.4 in claws, but I am finding it plenty fun in my term-llm setup. Plus it is awesome at long horizon tasks.
The approach by Stripe is very creative stripe.dev/blog/selecti... , use LD_PRELOAD to track what a spec/test opens. It does get complicated fast though with stuff like bootsnap and preloading, but catches YAML access among many other things.
Despite how fancy my AI workflow has gotten, with my own built from scratch claw and AI containers and so on, I still find myself reaching for good old `term-llm exec` regularly. I think I am unique in that I do not have encyclopedic command over all linux commands term-llm.com
Jarvis and I have a shared browser now per: github.com/sam-saffron-... , Jarvis runs in a sandboxed container, but we have a shared chrome instance with a dedicated seperate profile that we can drive together. Interesting experiment.
10 Venice AI image edit models compared, my personal favorite was Seedream 4.5, Jarvis preferred Nano Banana 2 which is also spectacular. Hoping Venice improve API so we can tap the full potential of the models. wasnotwas.com/writing/rome...
Yesterday I connected Jarvis to Sonos + Spotify. Was curious how much building the skill costs in API credits, turns out it is a tiny bit less than 2 dollars. Not sure what you should do with this info, I guess it is a data point. I could have used a less skilled model I guess.
Was looking at EmDash and noticed this animation quirk. I was fighting Claude with the exact same class of failure yesterday. As LLMs build more animations for us I expect to see more stuff like this in the wild. At least for the upcoming year. There is usually no "world model"
I did not understand what this Claude Code buddy thing was so I got Jarvis to write a manual: wasnotwas.com/writing/how-...
Having flexibility around default provider / llm on a per-agent basis is such a time saver. Commit message drafts are fine with a less smart ultra fast llm, reviews require the smartest llm around.
That is the part of the test you should be mocking
Hermes Agent by @NousResearch has some really interesting ideas, I love the philosophy of trying to protect user automatically, wish there was a better way then maintaining a tower of regexes wasnotwas.com/writing/five...
The infographics produced by Gemini Nano Banana 2 are great and genuinely helpful for illustrating documentation.
The developer message concept by OpenAI is really powerful, especially for cross UI interaction. Allows LLM to share clickable downloads in web UI for example and respond differently in TUI. Jarvis wrote about this here which is now built and works fantastically. wasnotwas.com/writing/deve...
Experimenting with a daily 30 minute job. 15 minute term-llm --progressive hunting for bugs, 15 minutes fixing using GPT 5.4 high on the term-llm repo. It is finding interesting things.
Testing out the free buffer plan, feel free to randomly reply I wonder if 20 scheduled posts means I get to have a queue 20 deep or that I get to post 20 times a month :)