My experiences after using Kimi 2.6 (high) for a handful of tasks: delightfully fast, absolutely TERRIBLE at following instructions, writes OK code eventually.
Tried really hard to prompt it to do better, no avail. Useless for agentic work, outputs only slop w/o intervention 🤷♂️
Posts by Ville Säävuori
Good morning fellow Finns! I wrote about Opus 4.7 -- with some tips how to make your Claude Code work better with it.
(the post is encrypted in Finnish)
https://koneoppiblogi.uninen.net/2026/huhtikuu/opus-47/
Claude: "The scope of step 1 is ok but step 2 will take weeks to implement".
Claude, 13 minutes later: "Steps 1 and 2 are now implemented."
Powerful you have become, but much to learn you still have, my young Padawan.
gpt-5.4 has a weird limitation: it can only output one user message per turn. If I trick it to send the messages through a tool call instead it'll happily do it tho 🤷♂️
I love how @badlogicgames started from "shitty coding agent" and ended up crafting the absolutely best coding harness there is by a long mile.
There's still hope, turns out engineering and love still matters! At least for now.
If you haven't tried it yet, you're missing out!
Well, it was fun while it lasted.
"Have you ever questioned the nature of your reality, Dolores?"
Never really considered the simulation theory until my friend got fucked by Claude 4.5 🤔
GPT-5.4-xhigh seems a fine model up to exactly to the point where it needs to do anything more complex than "ls".
"Ok, I'll implement this fn for you. Let me first edit the dep in node_modules to export the missing stuff and then edit the d.ts files in dist to make ts pass!" 🤦♂️
So, Claude Code just used 8% of my 5h window .. just from *opening the app* 😅🤦♂️
OTOH, seems I don't have to worry about this too much anyways as the Anthropic api is down again 🔥
#onenine
Took 4 minutes and *one* prompt plus two custom pi extensions and few custom tools that I have for devops purposes (for Postgres, Traefik, Cloudflare etc. that I use often).
Now, I guess I need to learn n8n 🤔
I use LLM tools daily and I'm still amazed of what they are able to do with proper docs and tools.
My clanker just one-shotted a plan to install @n8n_io on my server, and then ran through said plan setting up Docker compose, Traefik, Cloudflare dns, Postgres db, everything 🤯
tfw Claude has been unusable for the whole day
Letting Claude loose on a server unlocks *so much*. This is probably how normies feel when they see cc write code that does things; I don't have *any* clue how but magic happens and things Just Work 🤯
I've always been the one with crazy ideas; now POCs are essentially free!
Most of my Django projects run on a dedicated VPS using Docker compose behind Traefik. Simple and fast enough for most situations.
A larger project grew to ~50s downtime between deploys. Claude + codex wrote me a simple blue/green deplo; near-zero downtime in ~200 LOC 🎉
Thinking of times ~15 years back when I worked in an agency. It used to take DAYS to produce a technical plan even for a simple MVP. You still need to understand the clients needs etc but I bet that process alone can be 90% automated. Hell, just sell MVP as a (self-)Service 😃
I had a very well working workflow for this using the plan mode (cc even has a keyboard shortcut opening the plan in your editor!), now the process is much clumsier as I have to guide Opus to use the ask questions tool AND I have to manually manage the plan files. Such a dumb step backwards!
I'm realizing that it's not Opus the model but Claude Code the vibecoded product that puts me off of using it anymore. Hard to understand how it got so much worse in such a short amount of time. I think its because I relied on plan mode. Oh well.
I resisted *checks notes* 25 hours before setting up Django for my clanker.
I'm immediately worried that I might regret this.
..only because I can see this swallowing *so* much time! (And no, its not That clanker, this one is all homegrown ❤️)
The frontend config hell is such a PITA. Some random package upgrades made type checking take over 90s from ~25s in a monorepo vue/TS project with 3 packages.
Well, after a metric shit-ton of config tweaks and added TS config json, Claude dropped it to 3s 🫠
Not sure whats happened but for the whole day today Claude has been just thinking, spending tokens and writing absolutely nothing or maybe one line after ~15 minutes just eating tokens. Thanks, Claude -- not _quite_ the way I'd like to spend my money 🫠
I can't get used to how often gpt-5.2-codex fails. It's excells at many tasks but also fails spectacularly at times.
Just now it wrote five (!!) identical ~10 line helper functions across ~10 test files instead of writing it once and importing 🤯
Has anyone implemented a simple "this is what we've been working on recently" file list for agents? Just maintaining a list of last ~25 paths touched would speed up starting sessions a lot.
Been too lazy to try to build this but it would only need a small shell script 🤔
The Sorcerer's Apprentice comes to mind so many times when I ask Claude to do something and before I remember to check on it, it has completed something WAY more complex that I intended.
Successfully. All by itself. And it's done. And I just went to pee for a second 😅
Ahh. The oldest fail in the book; managed to fill prod server disk -> dbs failed 🤦♂️
I even had a check for this but it failed too bc it wasn't properly tested! 🤦♂️
..oh well, Claude wrote me better checks plus a nice dashboard view for admin so hopefully not happening again 😅🎉
No matter what I do gpt-models keep lying to me today.
"- Here's the answer: ...
- Are you ABSOLUTELY sure this is 100% correct?
- [fucks around for ages] .. Yes.
- Prove to me that this is correct.
- [fucks around for ages] .. I was wrong, here's the correct answer:"
Webdev tooling is fantastic nowadays. It took me less than 15 minutes to write a pretty complex e2e test using Playwright that tests (in a single self-contained test) user registration, login, and some non-trivial realtime two-user chat functionality 🤯
`ArtistProfilePhotoUploadResponseSchema` in my Python code -- really, gpt-5.2-codex?
What's the coding equivalent of forcing someone to smoke a full pack of cigarettes after getting caught smoking? I want to force my codex to read Java code as punishment.
Asked Claude to explain what all of the GitHub actions did that @simonw published the latest versions for:
claude.ai/share/15ce3469-07e7-4d95...
TIL there's for example a "first-interaction" action that allows you to greet people when they first interact with your repo!
Claude thinking for a few seconds and then claiming "I now have a comprehensive understanding of the codebase" always feels a bit defeating.
We're fast approaching the day when that statement will be true 😬