🌍 WorldEngine: Towards the Era of Post-Training for Physical AI
🎯 A post-training framework for Physical AI that systematically addresses the long-tail safety-critical data scarcity problem in autonomous driving.
Github: github.com/OpenDriveLab...
Project Page: opendrivelab.com/WorldEngine/
Posts by Kashyap Chitta
🌍WorldEngine is one of the most exciting projects in AD in the past years!
It's a post-training framework tackling the scarcity of long-tail safety-critical scenarios by: mining -> 3DGS reconstruction and dynamic agents control w/ behavior world models -> RL post-training.
Blog, code and data are up
We're releasing OVIE, a novel view generation model trained entirely on single images. No multi-view datasets needed.
Given a single image, it generates novel views of any scene in real time, running orders of magnitude faster than competing approaches.
MolmoBot, our open robotic manipulation suite trained entirely in simulation, now has code, training data, a data generation pipeline, & evals all available.
This puts our robotics models within reach of any research lab—no extensive real-world data collection required. 🧵
Following the success of the EurIPS and NeurIPS-Mexico City pilots in 2025, we are thrilled to announce two official NeurIPS 2026 satellite events for this year!
These will be held in Paris, France and Atlanta, USA, respectively, running alongside the main venue in Sydney, Australia.
Naver AI has put the word "World" into "World Models", at least on a metropolis scale: A world model for Seoul using Naver Maps (the Seoul capital area is 25.6 million people, btw.).
seoul-world-model.github.io
arxiv.org/abs/2603.15583
@jnhwkim.bsky.social
Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵
Interested in ✨world models✨? I just open-sourced an implementation of the Dreamer 4 world model. It's in PyTorch and comes with a pretrained model + a neat little web interface that lets you interact with any of 30 DMControl tasks that I trained it on!
Link: github.com/nicklashanse...
cseweb.ucsd.edu/~tzli/novelt...
I gave an internal talk at UCSD last year regarding "novelty" in computer science research. In it I "debunked" some of the myth people seem to have about what is good research in computer science these days. People seemed to like it, so I thought I should share.
Simple and efficient transformer based end-to-end driving. If we could give out another innovation award for NAVSIM, it would go to this work!
valeoai.github.io/driving-on-r...
While we also hype the moderation features, I'm really excited about the paper discovery tools @mariaa.bsky.social and I are starting to build. Open social means we can bootstrap onto existing discussions that are happening
Wrapping up 2025 with a review of some recent work, led by several amazing students and collaborators: Yixuan Pan, Ruoyi Qiao, @jiazhiyang.bsky.social, Shuhan Tan, Brayden Zhang, Shihao Li, @longpollehn.bsky.social, Peter Karkus, and @maxigl.bsky.social!
kashyap7x.substack.com/p/2025-resea...
Wrapping up 2025 with a review of some recent work, led by several amazing students and collaborators: Yixuan Pan, Ruoyi Qiao, @jiazhiyang.bsky.social, Shuhan Tan, Brayden Zhang, Shihao Li, @longpollehn.bsky.social, Peter Karkus, and @maxigl.bsky.social!
kashyap7x.substack.com/p/2025-resea...
Our new E2E driving method, TransFuser v6, is out on ArXiv.
It outperforms all other methods on CARLA by a wide margin, 95 DS on Bench2Drive!
We show that minimizing the asymmetry between data annotator and policy is key for strong IL results.
Code, models, and paper:
ln2697.github.io/lead/
🧥 Live-stream robotic teamwork that folds clothes. 6 clothes in 3 minutes straight.
χ₀ = 20hrs data + 8 A100s + 3 key insights:
- Mode Consistency: align your distributions
- Model Arithmetic: merge, don't retrain
- Stage Advantage: pivot wisely
🔗 mmlab.hk/research/kai0 checkout 3mins demo
What's left to do in self-driving given Waymo is taking off? An argument that it's still a great research problem:
open.substack.com/pub/emergere...
Speaking of RL, Nvidia also just published a survey on the importance of closed-loop training (RL, etc.) in E2E driving.
research.nvidia.com/publication/...
Attending #Neurips2025? Get your personalized Scholar Inbox conference program now to easily navigate the poster sessions and find what you are looking for:
www.scholar-inbox.com/conference/n...
I'll be at #NeurIPS2025 in San Diego from Thu to Sat, and I am looking for PostDocs in Embodied AI, particularly in world modeling and simulator learning. Please reach out if you are interested.
Wondering how DeepSeek v3.2 rivals SOTA models (e.g., GPT5/Gemini 3 pro) while being ~30x cheaper? 🤔
Let's learn how the base model works!
We'll focus on attention, the need for KV caching, and key ideas for improving attention (MQA/GQA/MLA/DSA).
youtu.be/Y-o545eYjXM
🚀 Introducing TMLR Beyond PDF!
🎬 This is a new, HTML-based submission format for TMLR, that supports interactive figures and videos, along with the usual LaTeX and images.
🎉 Thanks to TMLR Editors in Chief: Hugo Larochelle, @gautamkamath.com, Naila Murray, Nihar B. Shah, and Laurent Charlin!
TMLR (@tmlrorg.bsky.social) is now proud to support interactive HTML-based submissions, going "Beyond PDF" -- check it out!
Thanks to Paul Vicol (@paulvicol.bsky.social) for his tireless work on this new option, as well as the OpenReview team.
Excellent speaker lineup for the @naverlabseurope.bsky.social AI for Robotics Workshop.
For those at home, the event is live-streamed on the landing page: europe.naverlabs.com/updates/ai4r...
We’re live! 🚀 Streaming: tinyurl.com/bdtk2nzs
The International Workshop on AI4Robotics by @naverlabseurope
2dys of Spatial AI, SLAM, robot learning, HRI, autonomy
This AM CET: @martinhumenberger.bsky.social @marcpollefeys.bsky.social Andrea Vedaldi Cordelia Schmid & @andrewdavidson.bsky.social ⬇️
A fascinating and historic panel discussion with six of the recipients of the 2025 Queen Elizabeth Prize for Engineering, honoring the critical interplay between Algorithms, Data, and Compute that gave rise to today’s remarkable advances in AI and Machine Learning
Launching the Physical AI AV Dataset! 🚀
huggingface.co/datasets/nvi...
One of the largest, most diverse & commercially usable open-source datasets for AVs.
- 1727 hours of driving data
- Camera, LiDAR, & radar
- 25 countries, 2500+ cities
This is just the beginning, more features to come!