Smart Chunks (@smartchunksblog) Bsky

LMSys Arena Elo April 2026: How To Actually Read It The LMSys Chatbot Arena ranking is a human preference poll scored with a Bradley-Terry model, not a benchmark. As of April 19, 2026 Claude Opus 4.7 Thinking leads at roughly 1505 Elo with Anthropic holding four of the top five slots inside a 20-point coin-flip cluster. What the leaderboard measures, how Arena rankings diverge from Intelligence Index capability scores, and what the Cohere 'Leaderboard Illusion' paper showed about selective disclosure.

Top 6 models on LMSys Arena sit within 20 Elo points.

That's a 52% win rate under the actual math. One point above random.

If you're picking models by Arena rank, you're flipping coins.

How to read it honestly:
smartchunks.com/lmsys-arena...

1 hour ago 0 0 0 0

Siemens just shipped the Eigen Engineering Agent — an autonomous AI that executes industrial engineering workflows, not just suggests them. Part of a €1B bet backed by 1,500+ AI experts and 2,000+ patents. This is agentic AI hitting the factory floor.

smartchunks.com/siemens-eig...

3 hours ago 1 0 0 0

Samsung just unveiled Project Luna, a rolling AI robot designed to be the brain of your smart home. But instead of talking, it just beeps. It's a bold bet on character over conversation in the race to put a robot in your living room. smartchunks.com/samsung-pro...

19 hours ago 0 0 0 0

GPQA Diamond Score Explained: The AI Benchmark That Actually Matters Frontier AI models score above 94% on GPQA Diamond while PhD experts score 65%. What the 198-question graduate-science benchmark measures, how the top four models of April 2026 (Mythos Preview, Gemini 3.1 Pro, Opus 4.7, GPT-5.4) compare, why no single leaderboard is canonical, and what's replacing Diamond as saturation sets in.

AI scores 94% on GPQA Diamond. PhDs score 65%.

But the top 4 models are within 0.5 points — one question on a 198-question test. Not a ranking.

Creators estimate 8% of questions have errors. Ceiling ~92%, not 100%.

smartchunks.com/gpqa-diamon...

1 day ago 0 0 0 0

Shoplazza is launching what it calls the world's first AI-native commerce OS, betting that multi-agent automation can replace the entire ecosystem of manual e-commerce tools. Is this the moment Shopify finally feels real pressure? smartchunks.com/shoplazza-a...

1 day ago 0 0 0 0

A single speech in Ravenna, Ohio, just torched plans for a new AI data center. Now, communities nationwide are questioning Big Tech's massive water consumption, putting hyperscalers like Google and Amazon on the defensive. Who pays AI's real price? smartchunks.com/ravenna-ohi...

1 day ago 0 0 0 0

Artificial Analysis Intelligence Index April 2026 Explained Three frontier AI models are tied at 57 on the Artificial Analysis Intelligence Index as of April 2026. Claude Opus 4.6 sits at 53. What the composite score is measuring, what it hides, how a one-point gap should actually be read, and why Opus 4.7 leads GDPval-AA by 439 ELO while sharing the same integer 57 with Gemini 3.1 Pro.

3 AI models tied at 57 on AA Intelligence Index. But on real-deliverable tasks Opus beats Gemini by 439 ELO. On research benchmarks Gemini crushes Claude.

Same score. Very different models.

smartchunks.com/artificial-...

2 days ago 1 0 0 0

The first major AI-native smartphone has no apps. Brain Technologies' Natural OS just launched across 5,000+ SoftBank stores in Japan, betting users are ready to ditch the icon grid for pure AI. Is the era of Apple and Google's app dominance over? smartchunks.com/brain-techn...

2 days ago 0 0 0 0

Top Frontier AI Models April 2026: Real Ranking (Not Marketing) Three frontier models are tied at 57 on the Artificial Analysis Intelligence Index as of April 2026: Claude Opus 4.7, GPT-5.4, and Gemini 3.1 Pro. Claude Mythos Preview sits off the public leaderboard. The open-source tier landed six points back at a fraction of the price. The real ranking, decomposed.

Three frontier AI models tied at 57 on the Intelligence Index: Opus 4.7, GPT-5.4, Gemini 3.1 Pro.

5x price gap between them. Open source sits 6 points behind at a tenth of the cost.

April 2026 ranking:

smartchunks.com/top-frontie...

2 days ago 1 0 0 0

Anthropic just launched Claude Design, an AI tool that generates prototypes and even production code from a simple text prompt. Investors noticed immediately, sending Figma's stock down 7% on the news. The design tool landscape just got a lot more competitive. smartchunks.com/anthropic-c...

2 days ago 0 0 0 0

Gemini 3.1 Pro Benchmarks Decoded: GPQA 94.3%, SWE 80.6%, Full Results Gemini 3.1 Pro leads GPQA Diamond (94.3%) and shares the top Intelligence Index slot. Complete benchmark breakdown — GPQA, HLE, LMSys, FrontierMath, and where it actually beats GPT-5.4 and Claude Opus 4.6.

GPQA Diamond leader: Gemini 3.1 Pro at 94.3%.
Intelligence Index: 57.17 (tied with GPT-5.4 at 57.18).
Cost: $2/$12 per M tokens — cheapest frontier by far.

All three facts. All verified. Where Gemini actually beats GPT-5.4, and where it doesn't:
smartchunks.com/gemini-3-1-...

2 days ago 0 0 0 0

Intel just put its most advanced 18A silicon into budget Core 3 chips, delivering 40 TOPS of AI grunt for as little as $600. This isn't just a laptop play — it's a direct shot at NVIDIA's dominance in edge AI. The math for value PCs just changed. smartchunks.com/intel-core-...

3 days ago 0 0 0 0

Meta just committed 1 gigawatt to custom AI chips with Broadcom — and plans multiple gigawatts by 2027. Broadcom's stock jumped 3% while Meta's stayed flat. The hyperscalers aren't just diversif... smartchunks.com/meta-broadc...

3 days ago 0 0 0 0

Microsoft just announced it's building its own frontier models — not as a side project, but as a strategic bet to reduce OpenAI dependence. Even with IP rights through 2032, the cloud giant can... smartchunks.com/microsoft-b...

3 days ago 0 0 0 0

Intel and Google just announced custom chip co-development targeting AI inference and cloud infrastructure — a direct play to fragment Nvidia's accelerator dominance. No specs, no financials, but... smartchunks.com/intel-googl...

3 days ago 0 0 0 0

Microsoft just shipped Copilot Health — an AI that actually reads your medical records and wearable data to help you prep for doctor visits. It's not diagnosing anything. It's translating your own health da... smartchunks.com/microsoft-c...

3 days ago 0 0 0 0

April 2026 just gutted the closed-model business. Google dropped Gemma 4 31B (89.2% AIME), Zhipu shipped GLM-5.1 under MIT license (beats Claude Opus 4.6), and Alibaba released Qwen3.6-Pl... smartchunks.com/google-zhip...

3 days ago 0 0 0 0

Anthropic just launched Claude Managed Agents — a fully managed platform that claims to cut agent deployment time by 10x. This isn't just a product. It's a direct challenge to OpenAI's agent tools and t... smartchunks.com/anthropic-c...

3 days ago 0 0 0 0

CoreWeave just landed Anthropic as a major customer in a multi-year deal to host Claude models — first servers online in 2026. The GPU cloud provider's annualized revenue hit $30B. That's a wild number for... smartchunks.com/coreweave-a...

3 days ago 0 0 0 0

Broadcom just locked in a long-term deal to manufacture Google's custom TPUs — part of that massive 3.5GW infrastructure deal with Anthropic. Stock popped 4% because investors finally see concrete AI revenue ... smartchunks.com/broadcom-go...

3 days ago 0 0 0 0

Microsoft just dropped three first-party AI models that beat OpenAI Whisper and Google Gemini on benchmarks — while running 50% cheaper. MAI-Transcribe-1 hits 3.9% Word Error Rate and tra... smartchunks.com/microsoft-s...

3 days ago 0 0 0 0

NVIDIA just released Ising — the first open-source AI models built specifically for quantum computing. 2.5x faster error correction, 3x better accuracy than current tools, and calibration time slashed fro... smartchunks.com/nvidia-isin...

3 days ago 0 0 0 0

Meta just shipped Muse Spark — its first model from the new Superintelligence Labs, built after Llama 4 crashed and burned. It reportedly matches OpenAI and Google on benchmarks, powers Meta AI across b... smartchunks.com/meta-launch...

3 days ago 0 0 0 0

Alibaba just deployed autonomous AI agents to millions of merchants on Taobao and Tmall — handling pricing, vouchers, and customer service without human input. This is the largest live agentic AI ro... smartchunks.com/alibaba-aut...

3 days ago 1 0 0 0

C3 AI just declared assisted development dead. Its new C3 Code platform builds production-grade enterprise apps from plain English prompts — no developers required. CEO says it's the end of an... smartchunks.com/c3-ai-c3-co...

3 days ago 1 0 0 0

OnePlan just shipped its April 2026 release with AI automation aimed at enterprise PMOs drowning in manual portfolio management work. The pitch: let AI handle the busywork so humans can focus o... smartchunks.com/oneplan-apr...

3 days ago 0 0 0 0

GPT-5.4-Cyber Explained: OpenAI's Trusted Access For Cyber Program OpenAI's new cybersecurity model is now available to thousands of verified defenders. What GPT-5.4-Cyber does, how Trusted Access for Cyber works, who qualifies.

GPT-5.4-Cyber launched April 14 to thousands of verified defenders.

Context: OpenAI's Codex Security has already fixed 3,000+ critical and high-severity vulnerabilities.

How Trusted Access for Cyber tiers work, who qualifies:
smartchunks.com/gpt-5-4-cyb...

3 days ago 0 0 0 0

Anthropic's Claude just got a major upgrade. A new connector from Lucid lets the AI search, summarize, and even generate complex diagrams directly within a chat. It's a direct shot at making... smartchunks.com/lucid-claud...

3 days ago 0 0 0 0

Anthropic just stopped selling AI models and started running your workflows instead. Claude Managed Agents embed automation into their platform — making it way harder to leave. Notion, Asana, and R... smartchunks.com/anthropic-c...

3 days ago 1 0 2 0

Amagi just shipped Newspulse — an agentic AI that watches live broadcasts and autonomously cranks out digital content for every platform. No human supervision. June 2026 launch. The bet: newsrooms ... smartchunks.com/amagi-newsp...

3 days ago 0 0 0 0

Posts by Smart Chunks