1/13 New Paper!! We try to understand why some LMs self-improve their reasoning while others hit a wall. The key? Cognitive behaviors! Read our paper on how the right cognitive behaviors can make all the difference in a model's ability to improve with RL! π§΅
Posts by Ulyana Piterbarg
Thank you to @sloanfoundation.bsky.social for this generous award to our lab. Hopefully this will bring us closer to building truly general-purpose robots!
(Many) more details in our paper! arxiv.org/abs/2410.02749
LMs trained to synthesize programs by repeatedly editing their own generations produce more diverse code compared to baselines
This improves the trade-off between test-time FLOPs and pass@k
Our approach introduces an algorithm, LintSeq, for sampling across interdependent lines in source code by using a code linter
With LintSeq, we can generate plausible edit *trajectories* for any source code file, covering possible ways of synthesizing its contents edit-by-edit with no linter errors
Our paper showing that LMs benefit from human-like abstractions for code synthesis was accepted to ICLR! πΈπ¬
We show that order matters in code gen. -- casting code synthesis as a sequential edit problem by preprocessing examples in SFT data improves LM test-time scaling laws
Can we extend the power of world models beyond just online model-based learning? Absolutely!
We believe the true potential of world models lies in enabling agents to reason at test time.
Introducing DINO-WM: World Models on Pre-trained Visual Features for Zero-shot Planning.
Williams and Zipser (1989) is a classic one! leech.cybernoid.gr/files/text/p...
Finally finally finally some scaling curves for imitation learning in the large-scale-data regime: arxiv.org/abs/2411.04434
Introducing π§Genie 2 π§ - our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents π§ .
Now that @jeffclune.bsky.social and @joelbot3000.bsky.social are here, time for an Open-Endedness starter pack.
go.bsky.app/MdVxrtD