I also cannot ship my work today because it’s too powerful to release publicly
Posts by Andreas Holley
medium.com/refactoring-... this blog post from Refactoring UI suggests adding similar (at the top in their example), wondering if a bunch of resources like this are in the training data, or whether it’s more examples from real sites doing this
The one on the right looks like Gabe Newell
My cat Daisy, a long haired tabby, sleeping
Fluffy
Alreet, we’ll be upgrading to veet eeet tonigheet
Of course a lot of it comes from the model, using GPT-5.4 here, but even much smaller models manage the tool calls. Also it has no system prompt, perhaps aside from whatever’s baked into the model itself, so start of chat behaviour feels odd, but incredibly quick to start responding
Screenshot showing agent chat transcript: >>> Tool call: ls({"path":"."}) Tool call: grep({"pattern":"tool|function|api|assistant|namespace","path":"."}) >>> Tool call: cat({"path":"package.json"}) Tool call: cat({"path":"README.md"}) Tool call: cat({"path":"index.ts"}) Tool call: cat({"path":"CLAUDE.md"}) >>> I took a look. Right now this is a minimal CLI coding agent with tools for: - `ls` - `cat` - `grep` - `write_file` - `ping` That’s a solid starter set, but the most obvious missing capability is: ## Best new tool to add: `run_command` A tool that executes a safe local shell command and returns stdout/stderr.
Just had a bit of an awakening moment with agents, going through fly.io/blog/everyon.... Added an ls and cat tool using the same pattern as ping. Got the agent to inspect its own source code, it suggested adding write_file, and now the agent is modifying and improving itself 🤯. All in ~200 lines TS
Middle of a bunch of tool calls, or a sub agent doing something big that itself is correct, but yeah if it’s clearly doing something wrong then of course you want to stop it.
In terms of the actual output, not sure - can’t use codex at work at the moment. Feels like a nicer ux, I message now and it sees it now. With claude often I’d message and then mid typing be interrupted by a command approval prompt. With interrupting I worry about the timing being bad?
This is on Monzo's big and complex systems though, very different approach when doing something for a personal project.
I too also went a bit over the deep end, with the point at which I first read the code being a (draft) PR, and honestly it was kinda cool at first but I'm now back to either babysitting or heavily scrutinising and iterating on a plan before starting to execute.
Have you used Codex for the 'steering' angle? It reads your message while continuing, whereas claude you have to explicitly stop it, or it waits for some kind of gap to actually read your message. The message queueing in claude code just feels bad in general now
Gives off slight continvoucly morging energy, some arrows come from the holes, some come from the cheese
Where technical debt comes from
Skill issue
@samwho.dev how long is your bread or how short is your toaster
I think this one isn’t your standard spring and latch for on systems - it looks to have a temperature dial, timer dial, and then the lever is just a ‘move the bread up and down’, rather than switching on/starting the toasting process
Considering how high tech other kitchen gadgets are these days it’s always a surprise to see how unevenly hot the heating elements on a toaster get. Wonder how much you have to pay for something that isn’t just some hot bendy wires sitting near your bread
Screenshot showing codexbar with codex usage, session with 81% left with 53 minutes until reset
Meanwhile codex with ~5x the tokens used, so if anything I'm getting ~25x the usage?? Something seems not right here. Claude code is usually _fine_ when I use just sonnet in a single instance without a bunch of sub agents.
Yeah this new agent team mode with Opus 4.6 and the pro plan are not made for each other, blew past my 5 hour limit in 11 minutes.
We have Claude with ~uncapped spend at work so I’m pretty invested on that side of things and make good use of it. But codex also lets me use the plan with Opencode so another thing to look into (I used opencode with Minimax quite a bit a few weeks ago and it was pretty close to Claude with Sonnet
I tried out codex for some personal stuff (have a month free of the plus plan), and it seems like I have easily 10x the limits with 5.3 codex vs Claude code with the pro plan (same price, and I’ve been paying monthly since November).
The desktop app seems pretty nice too
AI can do this, whatever this is
Advert on the tube for ‘access training’ suggesting AI can’t do pipes
Can’t do what? Make an absolute mess of some pipes?
I'm seeing way more open source projects pop up because of AI - whether they'll stay maintained or are actually good is another question, but how much of existing open source is actually good and stays maintained?
In a way AI derisks the abandoned projects a bit if you can maintain it yourself now
I did this prompt and it kinda looks good?
Galaxy brain meme "same look and feel on all platforms" "stick to the platform look and feel while keeping the brand design in there" "ask claude "make the ui look good""
But what does consistent mean and what is good?
How much do you lean into to the platform design language to make your app feel native, at the risk of having two (or more - what about web?) quite distinct app designs.
Another place this happened was the switch from native desktop to electron apps