#FrontierMath hashtag - Bluesky

5 days ago

GPT-5.4 Pro Cracks Open Math Problem OpenAI's GPT-5.4 Pro has solved an open math problem unsolved since 2019, with Epoch AI independently verifying the first AI solution on FrontierMath.

winbuzzer.com/2026/03/24/g...

GPT-5.4 Pro Cracks Open Math Problem, Epoch AI Confirms

#AI #OpenAI #LLMs #Mathematics #GPT54Pro #GPT54 #AIModels #Science #Frontiermath

1 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

2 months ago

Когда нейросеть решит то, что не решил никто? В середине 2024 года GPT-4 спотыкался на школьных задачах, а к концу...

#FrontierMath #Epoch #AI #LLM #бенчмарки #открытые #задачи #GPT-5 #Gemini #теория #чисел

Origin | Interest | Match

0 0 0 0

aztecsungod.bsky.social

@aztecsungod.bsky.social

2 months ago

A Zenodo analytics screenshot showing 191 total downloads. The text identifies 191 as a Sophie Germain Prime and the post marks the project's expansion beyond octonions into 32-dimensional math space and beyond with Applied Pathological Mathematics.

191 downloads. A Sophie Germain Prime (2p + 1 = 383). We are inspired by the French mathematician and dare to venture with her perseverance beyond octonions with Applied Pathological Mathematics.
doi.org/10.5281/zeno...

#SophieGermain #Sedenions #FrontierMath #MathSky #SciSky #AIResearch #DeepTech

3 1 0 0

@chavezailabs.bsky.social

2 months ago

A Zenodo analytics screenshot showing 191 total downloads. The text identifies 191 as a Sophie Germain Prime and the post marks the project's expansion beyond octonions into 256-dimensional with Applied Pathological Mathematics.

191 downloads. A Sophie Germain Prime (2p + 1 = 383). We are inspired by the French mathematician and dare to venture with her perseverance beyond octonions with Applied Pathological Mathematics.
doi.org/10.5281/zeno...

#SophieGermain #Sedenions #FrontierMath #MathSky #SciSky #AIResearch #DeepTech

1 1 0 0

Christian Stump

@christianstump.bsky.social

4 months ago

New math benchmark from math.science-bench.ai :

209 research-level mathematics problems from Combinatorics, Algebra, Geometry, Number Theory, and others.

👉 math.science-bench.ai/benchmarks/

#AI #Mathematics #AIBenchmark #EpochAI #FrontierMath #OpenAI #Gemini #Grok

8 3 0 0

Christian Stump

@christianstump.bsky.social

6 months ago

ScienceBench|Challenge the newest AI models Challenge the newest AI models with your hardest PhD-level exercises. Learn how to use AI in your math research.

The first ScienceBench benchmark is live!

👉 math.science-bench.ai/benchmarks/

We tested all major AI models on 100 research-level mathematics problems.

#AI #Mathematics #AIBenchmark #EpochAI #FrontierMath #OpenAI #DeepSeek

1 0 0 0

Tech Pakistan

@blockchainpakistan.bsky.social

1 year ago

Benchmarking for #LLM models is becoming increasingly complex and controversial, as seen in the #FrontierMath controversy. Companies in the development of benchmarks raises concerns about #fairness & #transparency, reinforcing the point that model performance needs to be validated by the community.

0 0 0 0

Meng Li

@mengli512.bsky.social

1 year ago

OpenAI o3 Model Fraud: Early Access to Test Questions, Mathematicians Unaware OpenAI's o3 model faces controversy after claims of early access to FrontierMath test questions, leading to accusations of unfair advantages and manipulation.

OpenAI's o3 model faces controversy after claims of early access to FrontierMath test questions, leading to accusations of unfair advantages and manipulation.

#AI #OpenAI #LLM #Agent #FrontierMath
aidisruptionpub.com/p/openai-o3-...

2 0 0 0

Christian Stump

@christianstump.bsky.social

1 year ago

How embarrassing! I have to take back that I submitted or have seem problems from #FrontierMath.

I just reveived an email from Humanity's Last Exam, a similar database, not restricted to mathematics, and realized that I contributed to that dataset instead!

#math #MathSky #LLM #AI

2 0 1 1

Christian Stump

@christianstump.bsky.social

1 year ago

#FrontierMath is the hardest math data set for LLM testing.

Seeing that #OpenAI #o3 can solve 25% -- after all other models two months ago were at ~2% -- is just incredible.

I submitted some problems and looked at others. No mathematician can get even close to 25%, I guess.

#MathSky #LLM #AI #ML

8 1 2 2

キタきつね

@kitafox.bsky.social

1 year ago

OpenAIのo3モデルが数学の超難問データセット「FrontierMath」で25.2％のスコアを獲得した衝撃を数学者が語る #Gigazine (Dec 25)

#数学AI #大規模言語モデル #AI研究 #推論モデル #FrontierMath

0 0 0 0

Jennifer

@jerle.bsky.social

1 year ago

Künstliche Intelligenz: Geheime Mathematikaufgaben blamieren KI-Modelle Fachleute haben einen geheimen Datensatz mit mathematischen Problemen gesammelt – KI-Modelle schneiden dabei schlecht ab

Sind #KI-Systeme automatisch Mathe-Genies? Fehlanzeige! ❌ Neue, superharte Aufgaben legen die Schwächen von #KI-Modellen offen. #Mathematik bleibt also vorerst noch eine menschliche Domäne.

Die Matheaufgabensammlung und die Versuchen kommen #FrontierMath (epoch.ai/frontiermath)

2 0 1 0

ByteFeed

@bytefeed.bsky.social

1 year ago

Today's Top AI Went Up Against Expert Mathematicians. It Lost Badly.

Today's Top AI Went Up Against Expert Mathematicians. It Lost Badly.

https://buff.ly/411Rdnw

#FrontierMath #ArtificialIntelligence #Mathematics

0 0 0 0

Edalgomezn

@edalgomezn.bsky.social

1 year ago

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI FrontierMath: a new benchmark of expert-level math problems designed to measure AI’s mathematical abilities. See how leading AI models perform against the collective mathematics community.

La organización de investigación Epoch AI lanzó #FrontierMath , un nuevo punto de referencia matemático que ha llamado la atención en el mundo de la #IA porque contiene cientos de problemas de nivel experto que los principales modelos de IA resuelven menos del 2 % de las veces. tinyurl.com/2dbxl4xr

0 0 0 0