#ARCAGI hashtag - Bluesky

@hendryadrian.bsky.social

2 days ago

Gemini 0.37%, Claude 0.25%, Grok 0%. Humans Destroyed Them All: ARC-AGI-3 ARC-AGI-3 is an interactive benchmark that drops agents into novel 64x64 grid environments with no instructions, exposing that frontier models score below 1% while humans solve 100% of the tasks. Anthropic’s Claude Dispatch ships the ability for a phone to control a live desktop Claude session with full filesystem reach, amplifying prompt-injection risk and highlighting that these models lack the abstract reasoning needed to safely interpret adversarial context. #ARC-AGI-3 #ClaudeDispatch

ARC-AGI-3 benchmark reveals frontier AI models like Gemini, Claude, and Grok score below 1% on novel tasks with no instructions, while humans achieve 100%. Highlights gaps in abstract reasoning and security risks in AI control systems. #ARCAGI #AIResearch

0 0 0 0

Winbuzzer

@winbuzzer.com

5 days ago

ARC-AGI-3 Offers $2M for AI Matching Human Reasoning ARC Prize Foundation has launched ARC-AGI-3, an interactive benchmark offering over $2M to any AI matching human reasoning, where top models scored below 1%.

winbuzzer.com/2026/03/30/a...

ARC-AGI-3 Offers $2M for AI Matching Human Reasoning

#AI #ARCAGI #ARCAGI3 #AGI #AIBenchmarks #AIResearch #AICompetition #LLMs #DeepLearning #MachineLearning #FrançoisChollet #ARCPrizeFoundation

0 0 0 0

Timothy McGirl

@timothymcgirl.bsky.social

1 week ago

ARC-AGI-3 ARC-AGI-3 is the first interactive reasoning benchmark for AI agents—play as humans and build agents that learn in novel environments.

The gap between AI and human learning is the last frontier. I'm working on closing it.
#ARCAGI #ArtificialIntelligence #AGI #MachineLearning #OpenSource

arcprize.org/arc-agi/3

2 0 0 0

Ґізчина-GizChinaUkraine

@gizchinaukraine.bsky.social

1 month ago

Google представив Gemini 3.1 Pro — ШІ для багатокрокового мислення, 3D і коду Гонитва за передовим штучним інтелектом триває: Google щойно випустив Gemini 3.1 Pro, оновлення,

Гонитва за передовим штучним інтелектом триває: Google щойно випустив Gemini 3.1 #Gemini31Pro #GoogleAI #ШтучнийІнтелект #AIмоделі #VertexAI #ARCAGI #GizchinaUkraine
gizchina.net/2026/02/22/g...

0 0 0 0

Evolution AI Hub

@evolutionaihub.bsky.social

1 month ago

Gemini 3.1 Pro Delivers 77% ARC AGI Score As Google Pushes Advanced Reasoning Gemini 3.1 Pro posts a 77% ARC AGI score as Google strengthens its advanced AI reasoning stack across developers, enterprises, and paid users.

Gemini 3.1 Pro just posted a 77% ARC AGI score.
That’s more than double the reasoning performance of 3 Pro.

Google isn’t chasing chat polish.
It’s strengthening the thinking layer.

#Gemini31Pro #GoogleAI #AIModels #ARCAGI #EnterpriseAI #TechNews
evolutionaihub.com/gemini-3-1-p...

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Tiny Recursive Model Beats Large Models on ARC‑AGI Puzzles

The new Tiny Recursive Model (TRM) uses a two‑layer network with just 7 M parameters and reaches 45 % accuracy on ARC‑AGI‑1 and 8 % on ARC‑AGI‑2, outperforming larger LLMs. Read more: getnews.me/tiny-recursive-model-bea... #tinysrecursivemodel #arcagi #llm

1 1 0 0

Harald Klinke

@harald-klinke.de

11 months ago

The Man Out to Prove How Dumb AI Still Is François Chollet has constructed the ultimate test for the bots.

If AI flunks François Chollet’s test, maybe it just struggles with colorful grids—not intelligence itself.
#AI #AGI #Intelligence #Chollet #ARCAGI #PhilosophyOfAI

2 0 0 0

気になるITニュース

@news-it.bsky.social

1 year ago

ITちゃんねる DeepSeekの推論モデル「DeepSeek-R1」をOpenAIのo1＆o3と比較することで明らかになったこととは？ #DeepSeekR1 #ARCAGI #ARCPrize #ITニュース

DeepSeekの推論モデル「DeepSeek-R1」をOpenAIのo1＆o3と比較することで明らかになったこととは？
#DeepSeekR1 #ARCAGI #ARCPrize #ITニュース

0 0 0 0

MJ

@mjrun.bsky.social

1 year ago

ARC AGI performance by grid size

For smaller grids on the #ARCAGI test you may call #o3 "superhuman" (this depends on how you define superhuman). For larger grids the performance falls very quickly to below human performance.

This may be directly related to the amount of tokens involved as grid size increases.

0 0 1 0

Andy Tseng

@andytseng.bsky.social

1 year ago

@melaniemitchell.bsky.social’s article sheds light on a genuine breakthrough in #AI, a shift that redefines its limits. Are we edging closer to human-level reasoning in ARC-AGI? If so, it’s a game-changer, and our understanding of AI will need a serious update. #ARCAGI #AIBenchmark #OpenAI

1 0 0 0

Dirk Roeckmann

@5troop.bsky.social

1 year ago

#AGI benchmarks should be developed by neutral orgs completely in private with no contact to the internet. Candidate #LLM would then be tested. This makes it impossible to train or fine-tune models on any benchmarks. Only afterwards results would be published. The only problem: Leaking #ARCAGI #ABAP

0 0 0 0

Sugato Ray

@sugatoray.bsky.social

1 year ago

🔥 #OpenAI o3 model performance makes a leap, sets a new high score on the #ARCAGI benchmark.

Source: arcprize.org/blog/oai-o3-...

#ml #ai #arcagi #benchmark #openai

2 0 0 0

気になるITニュース

@news-it.bsky.social

1 year ago

ITちゃんねる OpenAI、次世代AIモデル「o3」を発表、ARC-AGIテストで”85%超え”の快挙達成 #o3 #12DaysofOpenAI #o3mini #ARCAGI #ITニュース

OpenAI、次世代AIモデル「o3」を発表、ARC-AGIテストで”85%超え”の快挙達成
#o3 #12DaysofOpenAI #o3mini #ARCAGI #ITニュース

0 0 0 0