Advertisement · 728 × 90

Posts by Ben Eysenbach

Do you have a new model/prompt/algorithm that might work? Try it out! Code is open-source, and the leaderboard will be updated regularly with the best-performing models!

2 weeks ago 0 0 0 0
Post image

Here's one way of building that pyramid. It's not as easy as it seems, because you have to temporarily use blocks to build scaffolding. It requires a sort of _physical_ reasoning.

2 weeks ago 0 0 1 0
Video

Maybe not as simple as it seemed...

Leaderboard: On almost all tasks, success rates are 0% for Claude Opus 4.6, Gemini 3 Flash, GPT 5.2
rajghugare19.github.io/builderbench...

2 weeks ago 0 0 1 0
Post image

🏗️BuilderBench is a benchmark for answering this question!
github.com/RajGhugare19...

Let's start with simple: we'll ask Claude Opus 4.6 to build the pyramid here using a robotic arm.

Easy, right?

2 weeks ago 0 0 1 0

🧠🔭Today's AI models synthesize knowledge acquired from the internet/books/etc. Ultimately, that knowledge usually derives from real experiments. We know (say) the moon's mass because a human did a science experiment.

How well do AI models fare at generating knowledge? 🤔

2 weeks ago 2 0 1 0

🤖Excited to share SLAP,
@yijieisabelliu.bsky.social 's new algorithm using RL to provide better skills for planning!

Check out the website for code, videos, and pre-trained models: github.com/isabelliu0/S...

5 months ago 2 0 0 0

Kids spend years playing with blocks, building spatial+arithmetic skills. Today, AI models just read.

While AI research often conflates reasoning with language models, block-building lets us study how embodied reasoning might emerge from exploration and trial-and-error learning!

6 months ago 2 0 0 0
Advertisement
Post image

🚨 Excited to announce our #NeurIPS2025 Workshop: Data on the Brain & Mind

📣 Call for: Findings (4- or 8-page) + Tutorials tracks

🎙️ Speakers include @dyamins.bsky.social @lauragwilliams.bsky.social @cpehlevan.bsky.social

🌐 Learn more: data-brain-mind.github.io

8 months ago 31 10 0 3

New research directions:
* model-based RL with NF models,
* goal/language-conditioned NF foundation policies,
* NFs for collocation-based planning,
* goal-conditioned NF value functions (as control barrier functions, as Lyapunov functions).
👆Join/scoop us -- we can't do it all!

10 months ago 2 0 0 0

2/ Much of my past research is about avoiding density estimation in RL, because I've assumed that it's difficult and fickle. But, if NFs make it easy to do high-dim density estimation, there are lots of new RL algorithms to be developed:

10 months ago 3 0 1 0

Check out @raj-ghugare.bsky.social's new paper on the surprising effectiveness of normalizing flows (NF) in RL 🚀

This project changed my mind in 2 ways:
1/ Diffusion policies, flow-models, and EBMs have become ubiquitous in RL. Turns out NFs can perform as well -- no ODEs/SDEs required!

10 months ago 1 0 1 0

While we still don't understand precisely why depth helps so much, the benefits seem correlated with exploration. Thought experiment: What if the answer to the exploration problem in RL were to just increase network depth?

1 year ago 1 0 0 0

tldr: increase the depth of your RL networks by several orders of magnitude.

Our new paper shows that very very deep networks are surprisingly useful for RL, if you use resnets, layer norm, and self-supervised RL!

Paper, code, videos: wang-kevin3290.github.io/scaling-crl/

1 year ago 2 0 1 0

Excited to share new work led by @vivekmyers.bsky.social and @crji.bsky.social that proves you can learn to reach distant goals by solely training on nearby goals. The key idea is a new form of invariance. This invariance implies generalization w.r.t. the horizon.

1 year ago 13 3 0 0