This took a few hours from inception to deployment, with Gemini 3 doing most of the work this time, including the slick design. Hope you find it useful!
Posts by Yoav Farhi
metrics.help lists the most common ML training metrics, and shows graphs for what a healthy vs unhealthy progression looks like. You can also paste your log output to get a quick read on your current step.
KL Divergance? Gradient Norm? Clip Ratio? I keep forgetting what these ML metrics mean and what to look for in my training runs.
Since I couldn't find a central resource with simple explantations, I created one.
So I wired up a RL loop where Qwen3-VL (8B) created SVG illustrations of cute pets, then rated those illustrations (as PNG images) on various criteria.
Did this lead to better illustrations? YES IT DID!
Check out the write up and the examples (the good, the bad, and the ugly) on prompet.ai
Can an AI model teach itself to draw?
I was experimenting with reinforcement learning on small(ish) models, and had a fun idea: If a small vision-language model can generate SVG illustrations, can it also "look" at the illustrations and rate their quality?
I expect it to be obsolete soon when this feature is available natively, but it's a fun project nonetheless :)
Re sharing, I was inspired by your mention of amp's share function and created aisessions.dev to host codex/claude/gemini sessions. Comes with a CLI tool to lists sessions for a direct upload. Example: aisessions.dev/t/5o8g14EGC-...
"How does that work again" is something that I've been using a lot, and also a variation of it "How did we get here" where the agent will play archeolgist and analyze the git history.
I've been sharing some of what works for me on humans.md
Six years off stage
I realized today it's been six years since my last conference talk at WordCamp Kathmandu 2019. I used to make a point of speaking at least once a year – it took a lot of work to prepare, but the energy in the room and the conversations afterward always made it worthwhile. Covid…
Post Image
Comparing AI coding agents with a real task
I threw an issue from a side project at 6 AI tools: Claude Code, OpenAI Codex (the CLI version), Factory.ai, Devin, Jules, and All Hands. The task was a combined bug fix + feature…
yoav.blog/2025/06/10/comparing-ai-...