Data contamination threatens #LLM #AIEvaluation
Scaling has “limits to growth”. New #ARCAGI2 counters this problem with contamination resistant, compositional reasoning tests and human baselines require original reasoning Not just memory recall evaluation arxiv.org/abs/2505.11831
Google Gemini’s Deep Think just crushed the ARC‑AGI‑2 benchmark, and Nvidia dropped a fresh open‑source kit for autonomous driving. Plus, Cosmos Cookbook and Flux.2 updates from Black Forest Labs. Dive into the details! #GoogleGemini #ARCAGI2 #NvidiaAI
🔗 aidailypost.com/news/google-...
💡 ARC-AGI-2 mette in crisi i modelli IA più avanzati
gomoot.com/arc-agi-2-me...
#agi #arcagi2 #arcprize #benchmark #blog #chatgpt #claude #deepseekr1 #geminiflash #news #openai #picks #sonnet #tech #tecnologia
Neuer Test ARC-AGI-2 zeigt: MENSCH GEWINNT GEGEN KI!
KI-Modelle scheitern kläglich beim ARC-AGI-2 Test, während Menschen ihn locker lösen! 🤯 Neuer Benchmark enthüllt eklatante Schwächen aktueller KI.
#ai #ki #agi #arcagi2 #künstlicheintelligenz #artificialintelligence
kinews24.de/arc-agi-2/
New ARC-AGI-2 Test Exposes AI’s Intelligence Limits
wiobs.com/new-arc-agi-...
#ArtificialIntelligence #ARCAGI2 #OpenAI #GeneralIntelligence #AIbenchmark