Fuwn (@fuwn.net) Bsky

I've been running several automated tests on Sonnet 4.6 and GLM-5.1, and I can confidently say that, under the same conditions, GLM-5.1 sucks.

19 minutes ago 1 0 0 0

I hate to sound like a broken record, but Anthropic has some of the worst usage limits around.

I am blasting Codex with subagents 24/7, and I have over 80% of my limit remaining for this week.

Meanwhile, 1-2 prompts reach my 5-hour limit for Claude.

20 hours ago 0 0 0 0

I've avoided GLM models until now because they kept responding in Chinese to me, but it turns out it's just an OpenCode Go issue.

I bought a GLM Coding Plan out of curiosity, and the model works fine there.

1 day ago 1 0 0 0

I dislike that I did this, but I resubscribed to Claude.

I still rely on Codex with the $200 subscription for most tasks, but having a second opinion is always helpful.

2 days ago 1 0 0 0

Kimi K2.5 Turbo is the only non-SOTA model that I've genuinely enjoyed using and doesn't suck in my testing.

1 week ago 0 0 0 0

Terminal screenshot of a Codex session planning top-of-stack caching optimisation, showing status, notes, and updated implementation plan.

Genuinely, what is it saying ...

1 week ago 0 0 0 0

Dark-themed table displaying daily usage (Mar 25–30, 2026) for GPT-5.4 and GPT-5.4-mini, with columns for date, models, input tokens, output tokens, and cost; the totals row shows approximately 2.55 billion input tokens, 202.8 million output tokens, and a total cost of around $29,642.

My subagents are subagenting.

2 weeks ago 1 0 0 0

【OSHI NO KO】Season 3 was impact in its purest form.

Knowing how the manga ended, I'm uncertain about the final season, but it will be interesting to see it animated nonetheless.

2 weeks ago 1 0 0 0

My Codex usage has been quite quiet over the past week. I've been using it roughly as much as I usually do, but it shows considerably less activity than in previous weeks.

3 weeks ago 0 0 0 0

The AIO for my gaming PC finally failed. I didn't notice for about two weeks because I thought it was just my X3D CPU running at X3D temperatures.

3 weeks ago 1 0 0 0

GPT-5.4 is incredibly good at SwiftUI, which shouldn't surprise me.

Fundamentally, there are many reasons why it should perform well with it, but it's still a delight to see it resolve any bug I throw at it in a large SwiftUI application in almost no time.

3 weeks ago 0 0 0 0

Using GNU Make feels gross unless I want to feel extra C.

3 weeks ago 0 0 0 0

Screenshot of the Codex CLI displaying options A/B/C for test orchestration and a prompt to select, with a shell command preview underneath.

WTF is Codex on about? All I did was try out Garry Tan's gstack. It's done this multiple times now.

I know gstack is meant for Claude Code, but this is bizarre.

3 weeks ago 1 0 0 0

OpenCode Go's main appeal for me is having a set of hosted open-weight models to experiment with on the cheap and use as copium to pass the time until my SOTA limits reset.

I'm throwing the dumbest tasks at these chibis.

3 weeks ago 3 0 0 0

GitHub - gsd-build/gsd-2: A powerful meta-prompting, context engineering and spec-driven development system that enables agents to work for long periods of time autonomously without losing track of th... A powerful meta-prompting, context engineering and spec-driven development system that enables agents to work for long periods of time autonomously without losing track of the big picture - gsd-bui...

GSD 2 is incredible.

In my personal testing, skills and prompt frameworks have never significantly improved the agentic workflow, but GSD 2, supported by the Pi harness, has.

It's further enhanced by the launch of GPT-5.4 mini, which allows GPT-5.4 to utilise quick yet intelligent subagents.

3 weeks ago 1 0 0 0

I've been anxious about using my ChatGPT Pro subscription for GPT-5.4, since I was down to 10% of my weekly allowance, but my cache hit rate is so high that I should have used it anyway. It has taken me about three hours to go from 10% to 9%.

4 weeks ago 1 0 0 0

GLM-5 makes so many character set and punctuation mistakes in its responses that I’m seriously unsure how anyone would use it for serious software. It seems fine for personal use, though.

4 weeks ago 1 0 0 0

GPT-5.4 performs considerably better in long-term autonomous tasks and low-level reverse engineering, while Claude Opus 4.6 seems to excel more at higher-level work, such as frontend development. Both are technically impressive, but I find the entire Codex ecosystem much more comfortable to work in.

4 weeks ago 1 0 0 0

I'm still running low on GPT-5.4 usage in Codex, so I've been postponing actual work to run harness experiments using Codex-5.3-Spark on Codex CLI, Pi, OpenCode, and ForgeCode.

Codex CLI and Pi are significantly better than the competition. Depending on the task, Codex CLI can far surpass Pi.

4 weeks ago 3 0 1 0

I could use a reset right about now. I would have already bottomed out if I hadn’t cut myself off a few hours ago at 15% remaining. 😬

1 month ago 1 0 0 0

Bar chart illustrating daily Codex usage over the past 30 days on a dark background. Each day is represented by a vertical bar. Early days show smaller red bars with moderate fluctuation. In the middle period, activity decreases with shorter pink bars. Towards the end of the timeline, usage rises considerably with several tall pink bars, including the highest peaks in the final third of the chart.

Can you tell when I started using GPT-5.4?

1 month ago 3 0 0 1

Screenshot of a dark-themed leaderboard table with columns “Rank,” “Agent,” “Rating (90% CI),” and “Δ.” Rank 1 is gpt-5-4-xhigh with a rating of 2035 (90% CI: 1979–2109) and a +1 change. Rank 2 is gpt-5-4-high with a rating of 1994 (90% CI: 1933–2048) and a −1 change. Green status dots appear next to both agent names.

Well, well, well. How the turntables.

voratiq.com/leaderboard

1 month ago 1 0 0 0

Screenshot of a dark-themed interface showing an AI “thinking process.” The text states it needs to locate the path for `platform.h` under the `targets` directory and explore it. A section labelled **Explored** lists steps: “List targets,” “Search platform.h$,” and “Read platform.h.” A final line explains the aim is to find memory configuration details for a standalone target to understand how the memory setup is structured in that file.

I love reading Codex's (GPT-5.4 xhigh Fast) thought process. It's been such a little ball of sunshine lately.

It's enthusiastic about everything.

1 month ago 2 0 0 0

Managing parallel agents is quite enjoyable. Perhaps I could pursue a future as a manager at a large tech firm, leading a team of AI agents and having my subordinates review every line of code.

1 month ago 1 0 0 0

Screenshot of a settings panel titled “Codex.” The description reads “Save additional Codex model slugs for the picker and /model command.” Below is a field labelled “Custom model slug” with placeholder text “your-codex-model-slug.” An example slug “gpt-6.7-codex-ultra-preview” is highlighted. At the bottom, it shows “Saved custom models: 0.”

GPT-6.7 will be AGI.

1 month ago 1 0 0 0

I just had to kill cmux.dev again after a roughly five-hour session for using 78.33 GB of physical memory.

Quality software.

1 month ago 0 0 0 0

Screenshot of the ChatGPT website's left sidebar navigation in dark mode, showing menu items with icons: Pulse, Images, Apps, Deep research, Codex, and Aardvark.

Aardvark.

1 month ago 1 0 0 0

What's even worse is I was trying out cmux.dev and had it open for about 16 hours, only to find it using 71.95 GB of physical memory.

So, in total, the Codex app and cmux.dev were using 105.71 GB of physical memory before I force-quit everything.

Quality software, isn't it?

1 month ago 1 0 0 1

Codex Security: now in research preview Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.

It was already clear that OpenAI was on a release spree, and they are still not stopping.

openai.com/index/codex-...

1 month ago 0 0 0 0

GPT-5.4's independence is truly unmatched.

My first and only prompt was completed in 15 hours, 51 minutes, and 9.173 seconds using GPT-5.4 xhigh with Fast mode enabled.

It was addressing a novel problem with no historical precedent, external resources, or human input.

#OpenAI #ChatGPT #Codex #AI

1 month ago 1 0 0 0

Posts by Fuwn