This is the area where I'd really struggle myself to build out the UI any better. Again, fine for MVP/POC but you'd want an experienced designer and ux person to improve everything for a real product launch.
Posts by Steve Gordon
UI-wise, it started quite well. The original designs were generic and basic, but I spent time in several design sessions guiding it to think out of the box and try 5 different designs. Then I'd pick the bits I liked and keep refining until we had something reasonably decent.
From a output perspective, it's a mixed bag. The backend razor pages application code works but isn't how I'd write it. For building a MVP/POC it's kinda okay, but it would need refactoring before launch. It's EF models are not always that sensible so it needs some handholding.
Even with subagents it doesn't seem to leverage them particularly well if I leave it to it.
Workflow wise... not great. I'm back to manually launching a specific agent profile and guiding it through each stage, planning, implementing, testing, manually. The recent 5 hr session limits have made my planned flow impractical. It burns too quickly.
The cynic in me thinks this is a great strategy to be intentionally dumb and burn my allowance. It used to handle this just fine. On top of that it's ignoring my skill guiding it how to run TUnit tests, and I watch it try something like 8 times with an invalid command. So it's also wasting time.
Sonnet 4.6 is annoying me! Gave it a clear openspec with 8 deliverables. Asked it to implement deliverables 1 to 3 (more token efficient) but it implements randomly tasks from all deliverables, but not all tasks for all deliverables, burning a huge chunk of my #claudecode pro 5 hour window!
I find the sessions very useful to refine the idea and I often ask them to come up with 10 "outside of the box" ideas to consider. Often this teases out some useful thoughts we can further iterate on until we have a really interesting and detailed plan for the feature that we can then spec.
One of the things I enjoy about AI agents is the brainstorming phase. I have CEO, Requirements Analyst, PM and Solution Architect personas that I use to take a feature idea and have them challenge me on the functionality and technical aspects. I ask them to be 95% confident my intent is understood.
I've been using Sonnet 4.6 and so far I'd say GPT 5.4 is easily comparable in quality. I'm building from very refined specs though. It seems faster and seems to faff less to achieve the goal. I'm interested to see how it does on a more complex, open ended problem. Opus may still be the one to beat.
Codex has got me back to productive on my side project this morning. Churning through deliverables on my next feature around preparing the children to head out to a local zoo with the occasional prompt. Claude usage was 77% 5hr after the first small change. Codex is on its 10th with 65% remaining.
I'm on PTO for a week but when im back, I'm interested to see the burn rate on work stuff using the Enterprise plan. I hit $55 in one day and bar one refactoring, most tasks were smaller and focused on a small number of CI and build files + some tests. Seemed excessively high for what i achieved.
Same. Before the last week or so I'd been planning to upgrade to max for a side project but it's a no go until trust is won back that it'll be worth the personal investment. Codex is working out for now and £20 plan is enough to unblock progress while I wait onto see how this shakes out.
Ive been pure Sonnet and still hitting massive usage. I started a new session with a basic prompt and after one file read i was at 18% of the 5 hr usage. Gave up after that.
And the results are in. For the next deliverable, same basic refactor against three files (3 tasks), Codex with GPT5.4 medium used 4% of my 5hr window.
I used to be able to work through a 1, usually 2 complete openspecs for new features in the 5hr window (Pro). Now I'm struggling to get 3 deliverables (3 of 11) from one change. The last one burned 26% for 3 small tasks touching two files.
I'm trying out Codex CLI and the pro subscription this morning. Claude Code seems to have become unusable with the whole token usage fiasco. I've removed most of my helpful infrastructure to save tokens but even plain openspec apply tasks are burning my usage and the changes are slow too.
Asking copilot why it put some unlrelated tests into an existing test file. Here is my prompt: "The only thing I want to challenge, is why are the dbcontext tracking guards in #sym:FeatureFlagTests rather than a distinct well named class?" And here is the relevant part of the response: "I put them in FeatureFlagTests as a shortcut because I was already modifying that file, but that conflates concerns and makes discoverability harder for future maintainers."
Another example of AI acting like a lazy developer (and yes, humans do this too). Here @github #copilot decided to ignore common file separation and use "a shortcut" by just throwing them into an existing test file. Overall it saved me time on a chore, but I still have to watch closely.
The final preview of #ILSpy 10 is ready for download github.com/icsharpcode/...
I'm continuing to experiment with subagents and an automated ochestrated workflow for feature implementation, with QA and code review. It's kinda working, but it burns tokens and several steps just fail and the ochestrator (main session) end up doing the work again. Needs more investigation!
I thought I had the dotnet-data plugin installed, but it seems not on this machine. Maybe (hopefully) that would have helped here.
A notepad showing features or phases being implemented with columns to identify when various workflow steps had been manually triggered by me. Basically a large table of check marks to visually track my manually session workflow before subagents.
One of the reasons I'm spending time learning about #ClaudeCode subagents is that while my workflow has been quite productive, I'm becoming the slowest part of the process, manually orchestrating various sessions. Keeping track became hard and tiresome so I ended up with notes to keep track!
This morning I am playing with #ClaudeCode subagents and building a test workflow.
Yuk! Caught Claude Code using AsyncLocal to work around a pooled EF DbContext rather the recommended pattern of scoped factory. Code reviews are still important people! The challenge is that I find reviewing/reading code I haven't written quite slow as I need to build up the context. #dotnet
A Claude Code prompt showing that Claude has "noted" a UX issue for a future polish pass, but when I ask where the note is, the response is "Nowhere yet - just words. Let me add it"
Validate everything with AI!
One unhealthy habit introduced with my personal Claude Pro subscription is wanting to maximise my usage. I now find myself starting as session as soon as I wake to maximum the number of 5 hour windows I get. Also juggling prompts to get the maximum 5hr and weekly token usage.
I did have a hook set up on stop (which may have been the wrong place) you run my devlog skill which include the note about prompting for a commit decision. Ive removed the hook and the skill seems to work fine on its own now. Any good resources for proper hook usage?
I must admit, for my current learning project I have been. My goal was to see if I could get something built with limited input from me while on holiday. I'd start a prompt and let it fly so I was using dangerously skip permissions. For "real" dev during a work day I would not do that!!
Pembrokeshire. Some beautiful coastline to explore. Yeah, TBF its our fault for buying an XC90 with huge damn wheels (wasn't easy to change on an uneven pull in on a country lane). It's the first time I've had to buy a new tire so the price surprised me too!!
At a motorway services on our way home from 11 days in Wales. It's been a nice break with lots of activities and outings for the girls. Had a surprise extra cost of £313 yesterday after bursting a tire in a nasty hidden pothole. Glad we could get it replaced rather than limp home on the spare!