Step 1: release GPT-Image-2 that produces stunning UI and web design images
Step 2: release GPT-5.5 with improved vision to take these images and produce stunning apps and websites.
Second step is soon. 😀
Posts by Kol Tregaskes
New version is faster, more concise reasoning.
x.com/i/status/20...
The GitHub merge:
x.com/i/status/20...
Via:
x.com/i/status/20...
Original post:
x.com/i/status/20...
Is there anything you need help with?DeepSeek API (deepseek-reasoner)"
More in the thread below.
"I am the latest version of the DeepSeek model (deepseek-reasoner), with a context length of 1M tokens (approximately 1 million characters). I can process super-large texts in a single go, such as the full-length novel trilogy The Three-Body Problem.
DeepSeek's API has been updated, appears to be a Lite Deepseek v4. Repo merge was spotted earlier.
Screenshot translated with Grok:
And the drunk Sam reply. 😂
x.com/i/status/20...
Another OpenAI response confirming Codex will always be available to lower and free plans:
x.com/thsottiaux/...
OpenAI's "100%" response. 😀
x.com/i/status/20...
Head of Growth's "2%" response thread:
x.com/TheAmolAvas...
Post on the price plan change:
x.com/TheGeorgePu...
Now Claude Code appears to be back on the pricing page. This is a bad look. If they’re compute-constrained, say so - honesty and transparency matter.
Thread below.
Anthropic pulled Claude Code from Pro plans and quietly updated the pricing page. Their Head of Growth called it a “test” affecting “2%” of users - but reports suggest it was broader than that.
Meanwhile, OpenAI delivered a crisp “100%” response, followed by Sam’s drunk "boomer" reply.
The post.
x.com/ericmitchel...
- overall feels like it gets users 90 % of the way but stalls on the final 10 %
- needs perfect web browsing and a lightweight agentic version that does not burn quota quickly
- should test its own outputs and use quicker sub-agents for simple tasks
- certain advanced agents and features are not yet available to everyone
- cannot watch or analyse video at human level or perform reliable computer/phone usage
- frontend capabilities and variety remain limited and chart-like
- office tasks such as opening browsers or editing Excel should be possible directly
Other practical frustrations
- loses track on large files or complex tasks instead of breaking them into steps
- struggles with design decisions, source authority and closing sub-agent sessions
- image generation in agentic flows cannot receive explicit prompts from the LLM, breaking context in workflows
- sidebar of infinite chats feels outdated; better organisation beyond basic projects or file trees is needed
Performance and task reliability
- coding improvements lag behind what an AGI should deliver
- Deep Research sometimes returns zero new information after long waits
- confusing split between ChatGPT and Codex platforms; models in Codex feel less engaging
- bad connectors for Gmail, Google Docs and similar tools
- some features Mac-only; needs broader platform support
- unclear usage limits and quotas on Pro plans make heavy use unpredictable
- tone, personality and guardrails shift unpredictably between releases
- needs stability so the experience feels reliable rather than erratic
Integration and accessibility
- poor iPhone integration across apps and lack of embedding into wearables, homes or everyday devices
- should use memories to tailor suggestions differently for each user
- requires transparent controls over which memories are kept or set aside
- overall representation of the user for long-term help and alignment is still missing
Consistency and stability
- should become always-on and ambient, potentially via glasses or AR for true daily integration
Memory and personalisation
- memory is shallow and often confuses or assumes the wrong details
- needs deeper, truly personal memory that knows preferences, projects and history over time
- needs to act on user data like workout schedules or loneliness mentions without being asked
- not agentic enough; fails to control computers, complete meaningful work or operate seamlessly in the background
- cannot trigger, start or end voice chat through voice alone
Proactivity and agentic behaviour
- lacks proactive life management such as automatic reminders or accountability
- should suggest actions based on personal preferences, opinions and projects rather than generic replies
- guardrails prevent natural phrases such as “love” or honest “no / I don’t know”; should allow friction and disagreement when warranted
Voice mode limitations
- voice mode is robotic, outdated and times out frequently
- needs a major upgrade to feel truly conversational
- writing style has regressed since GPT-4.5 peak; now produces overly long, disconnected bullet lists instead of natural responses
- lacks perceptive understanding of user asks and context without extra engineering
- feels too robotic and soulless, especially after GPT-5 series
- comes across as an alien trying to sound warm or funny when prompted
- needs to emulate human speech fluidity while keeping full text-model intelligence