🏆 Looking ahead: We are working on hosting a competition in 2026. We want to see different policies and hardware setups compete head-to-head in the same arena. Let's put them to the real test!
Posts by Chonghao Sima
We’ve also released some "Quality of Life" tools to streamline your workflow: ⚡ Ultra-fast compute norm state (Significantly faster than the official LeRobot implementation!) 🛠️ Micro-tools for LeRobot dataset manipulation 🎮 DAgger support
Our repo includes code, data, hardware manuals, and inference setups for AgileX (Songling) and Ark (rolling out gradually). We really hope to bring the reproducibility standards of the CV community into this space. 🔧
Huge shoutout to the entire team and everyone involved behind the scenes! 🙌
KAI0 is going fully open-source this week. 🚀
📄 Paper: arxiv.org/abs/2602.09021 💻 Code: github.com/OpenDriveLab...
[5/5] Bottom Line
• Not all robot data is equally valuable
• Fast iteration > bruteforce scaling
• Weight-space merging can outperform joint training
• Stage-aware advantage estimation helps long-horizon tasks
📄 Full report: Q1 2026
📦 Data + checkpoints + challenge: 2026
[4/5] Problem: Long-Horizon Credit Assignment
6-minute tasks. Which actions actually helped?
Solution → Stage Advantage:
• Decompose into semantic stages
• Predict advantage directly (not value-diff)
• Smoother supervision, less error compounding
[3/5] Problem: Expensive Iteration
Collect new data → Retrain everything → Repeat
Slow yet expensive.
How? Model Arithmetic:
• Train only on new data
• Merge via weight interpolation
• Merged model > full-dataset model
Models trained separately preserve distinct modes.
[2/5] Problem: Distribution Mismatch
Training data ≠ Model behavior ≠ Real-world execution
This gap causes failures.
Solution → Mode Consistency:
• DAgger for failure recovery
• Augmentation for coverage
• Inference smoothing for clean execution
🧥 Live-stream robotic teamwork that folds clothes. 6 clothes in 3 minutes straight.
χ₀ = 20hrs data + 8 A100s + 3 key insights:
- Mode Consistency: align your distributions
- Model Arithmetic: merge, don't retrain
- Stage Advantage: pivot wisely
🔗 mmlab.hk/research/kai0 checkout 3mins demo
@cvprconference.bsky.social
🚀 HERE WE GO! Join us at CVPR 2025 for a full-day tutorial: “Robotics 101: An Odyssey from a Vision Perspective”
🗓️ June 12 • 📍 Room 202B, Nashville
Meet our incredible lineup of speakers covering topics from agile robotics to safe physical AI at: opendrivelab.com/cvpr2025/tut...
#cvpr2025
Thanks for sharing! I will host the workshop for the whole day and welcome anyone who is struggling with current embodied AI trend to visit and chat and exchange ideas! We want to hear the opposite opinions from vision and robotics people on the topic of autonomy.
When at @cvprconference.bsky.social a major challenge is how to split yourself for super amazing workshops.
I'm afraid to announce that w/ our workshop on "Embodied Intelligence for Autonomous Systems on the Horizon" we will make this choice even harder: opendrivelab.com/cvpr2025/wor... #cvpr2025
Wonderful end-to-end driving benchmark! We are getting **closer and closer** to **close-loop** evaluation in real world!
@katrinrenz.bsky.social @kashyap7x.bsky.social @andreasgeiger.bsky.social @hongyang.bsky.social @opendrivelab.bsky.social
DriveLM got 1k stars on GitHub, my first project reaching such milestone. Great thanks to all my collaborators who contribute much to this project, many thanks to the community who participate and contribute better insight upon this dataset, and wish this is not my end!
Fun fact: the second character in my last name is 🐎 as well.
Thanks for sharing! We long to know if we could improve e2e planner with limited but online data and compute, as performance with more training data seems plateau. However, online failure cases are unexplored as they couldn’t directly contribute to the model performance via previous training scheme.
Random thoughts today: in humanoid research the methodology is basically decided by the final tasks/demo you would like to show off.
🌟 Previewing the UniAD 2.0
🚀 A milestone upgrade on the codebase of the #CVPR2023 best paper UniAD.
👉 Check out this branch github.com/OpenDriveLab..., and we will get you more details soon
🚀 This year, we’re bringing you three thrilling tracks in Embodied AI and Autonomous Driving, with a total prize pool of $100,000! Now get ready and join the competition!
Visit the challenge website: opendrivelab.com/challenge2025
And more on #CVPR2025: opendrivelab.com/cvpr2025
Thanks for all the staff who work hard to make it happen! Love to hear your feedback.
For 1, We may need a "greatest common divisor" among tasks/algorithms/embodiments.
For 2, retargetting seems to be the most critical issue.
For 3, should we follow sample-efficiency RL or VLM-based e2e methods?
Random thoughts (again) on:
1. Benchmark & Evaluation & Metrics
2. Data collection (especially tele-op)
3. Policy network architecture & training receipt.
Random thoughts today: situation in humanoids today is similar to autonomous driving back into 2020-ish. Different hardware setups, people more favor of RL-based planning and sim2real deployment, etc. Will humanoids get into a similar development curve like driving?
We implemented undo in @rerun.io by storing the viewer state in the same type of in-memory database we use for the recorded data. Have a look (sound on!)