Do you have a new model/prompt/algorithm that might work? Try it out! Code is open-source, and the leaderboard will be updated regularly with the best-performing models!
Posts by Ben Eysenbach
Here's one way of building that pyramid. It's not as easy as it seems, because you have to temporarily use blocks to build scaffolding. It requires a sort of _physical_ reasoning.
Maybe not as simple as it seemed...
Leaderboard: On almost all tasks, success rates are 0% for Claude Opus 4.6, Gemini 3 Flash, GPT 5.2
rajghugare19.github.io/builderbench...
🏗️BuilderBench is a benchmark for answering this question!
github.com/RajGhugare19...
Let's start with simple: we'll ask Claude Opus 4.6 to build the pyramid here using a robotic arm.
Easy, right?
🧠🔭Today's AI models synthesize knowledge acquired from the internet/books/etc. Ultimately, that knowledge usually derives from real experiments. We know (say) the moon's mass because a human did a science experiment.
How well do AI models fare at generating knowledge? 🤔
🤖Excited to share SLAP,
@yijieisabelliu.bsky.social 's new algorithm using RL to provide better skills for planning!
Check out the website for code, videos, and pre-trained models: github.com/isabelliu0/S...
Kids spend years playing with blocks, building spatial+arithmetic skills. Today, AI models just read.
While AI research often conflates reasoning with language models, block-building lets us study how embodied reasoning might emerge from exploration and trial-and-error learning!
🚨 Excited to announce our #NeurIPS2025 Workshop: Data on the Brain & Mind
📣 Call for: Findings (4- or 8-page) + Tutorials tracks
🎙️ Speakers include @dyamins.bsky.social @lauragwilliams.bsky.social @cpehlevan.bsky.social
🌐 Learn more: data-brain-mind.github.io
New research directions:
* model-based RL with NF models,
* goal/language-conditioned NF foundation policies,
* NFs for collocation-based planning,
* goal-conditioned NF value functions (as control barrier functions, as Lyapunov functions).
👆Join/scoop us -- we can't do it all!
2/ Much of my past research is about avoiding density estimation in RL, because I've assumed that it's difficult and fickle. But, if NFs make it easy to do high-dim density estimation, there are lots of new RL algorithms to be developed:
Check out @raj-ghugare.bsky.social's new paper on the surprising effectiveness of normalizing flows (NF) in RL 🚀
This project changed my mind in 2 ways:
1/ Diffusion policies, flow-models, and EBMs have become ubiquitous in RL. Turns out NFs can perform as well -- no ODEs/SDEs required!
While we still don't understand precisely why depth helps so much, the benefits seem correlated with exploration. Thought experiment: What if the answer to the exploration problem in RL were to just increase network depth?
tldr: increase the depth of your RL networks by several orders of magnitude.
Our new paper shows that very very deep networks are surprisingly useful for RL, if you use resnets, layer norm, and self-supervised RL!
Paper, code, videos: wang-kevin3290.github.io/scaling-crl/
Excited to share new work led by @vivekmyers.bsky.social and @crji.bsky.social that proves you can learn to reach distant goals by solely training on nearby goals. The key idea is a new form of invariance. This invariance implies generalization w.r.t. the horizon.