This week, at #ICLR2026, we are presenting:
RAP: 3D Rasterization Augmented End-to-End Planning
π Project page w/ code: alan-lanfeng.github.io/RAP/
Collab with VITA lab from EPFL
@iclr-conf.bsky.social #Bench2Drive #navsim
Posts by valeo.ai
πWorldEngine is one of the most exciting projects in AD in the past years!
It's a post-training framework tackling the scarcity of long-tail safety-critical scenarios by: mining -> 3DGS reconstruction and dynamic agents control w/ behavior world models -> RL post-training.
Blog, code and data are up
Our visiting PhD student @bliberatori.bsky.social
gave us a great talk on Video understanding beyond trimmed clips: temporal and comparative reasoning.
Make sure to have a look at her excellent works in this area: benedettaliberatori.github.io
1/n New paper - V-GIFT π
Self-supervised tasks like rotation prediction or colorization were big in 2018.
Do they still matter?
Yes.
We turn them into visual instruction tuning data for MLLMs.
Result: models rely more on the image and perform better on vision tasks π
π Farewell to EXA4MIND | David Hurych from @valeoai.bsky.social and the industrial application case leader of EXA4MIND.
As EXA4MIND draws to a close, we have brought together consortium members to discuss the project's impact over the last three years.
β¬οΈ
www.youtube.com/watch?v=mV-Y...
Congrats to our team member Marc Lafon for being awarded the 2026 AFRIF PhD prize! π afrif.irisa.fr?page_id=54
Congrats Dr. @vletzelter.bsky.social for his work defended in front of a stellar jury and answering to so many questions that @rflamary.bsky.social decided a break at some point ^^
Fun fact: Victor had not 4, not 6, but *8* advisors over the span of his thesis and he handled them brilliantly
Congrats to our @vletzelter.bsky.social for earning his PhD for his work on "Multiple Choice Learning from Ambiguous Signals".
Victor is a resourceful, curious and kind researcher. Good luck for next!
Fruit of an excellent collaboration between @telecomparis.bsky.social & @valeoai.bsky.social
I'm totally biased on this but I think it's wonderful that we now have official NeurIPS parallel satellite events. Parallelization was the trick that allowed us to scale to large data and models. It makes sense to try it for conferences as the community grows.
π¨ Happening NOW! π
Our PhD student Victor Letzelter @vletzelter.bsky.social is currently defending his thesis at Telecom Paris!
Good luck, Victor! ππ #PhDDefense #TelecomParis
LOSC: LiDAR Open-voc Segmentation Consolidator
π¨oralπ¨
tl;dr: 1) projecting 2D VLM predictions into 3D space + 2) refining w/ spatio-temp. and augmentation consistency --> 3) train a robust 3D network
by N. Samet, @gillespuy.bsky.social, and R. Marlet
π: arxiv.org/abs/2507.07605
π»:β
Is clustering enough for LiDAR instance segmentation? A state-of-the-art training-free baseline
SOTA LiDAR instance segmentation by clustering (no training)
by C. Sautier, @gillespuy.bsky.social, @alexandreboulch.bsky.social, R. Marlet, @vincentlepetit.bsky.social
π arxiv.org/abs/2503.13203
π» β
3DV starts TODAY! The valeo.ai team is thrilled to be there to present our latest research focusing on advancing LiDAR segmentation
#3DV #3DV2026 #3D #LiDAR
Check out the below for details! π
Congratulations to our researchers Renaud Marlet and @abursuc.bsky.social on their repeated recognition as outstanding reviewers at #CVPR, #ICCV, and #ECCVπππ
Thank you for your sharp insights, kindness, and dedication. It's key for the field to count on reviewers like you!
ELLIOT partners came together in Modena. π€
With 34 partners across 13 countries, the consortium met this week to review progress, discuss challenges and align on next steps, strengthening collaboration across Europe. π
CLIP's visual embedding projector is a few-shot cornucopia
π‘Few-shot adaptation of VLMs by fine-tuning the last matrix of the vision encoder
πhttps://arxiv.org/abs/2410.05270
π»code available β
by M. Fahes, @tuanhungvu.bsky.social, @abursuc.bsky.social, @ptrkprz.bsky.social, R. de Charette
Tune in to the live PhD defense of LoΓ―ck Chambon on "Efficient Representations for Autonomous Driving", a PhD co-advised by @mlia-isir.bsky.social & @valeoai.bsky.social
www.youtube.com/live/uFMAHn0...
Energized and inspired after our annual meetup to brainstorm on new exciting ideas and plan the projects ahead of us this year!
As always this is an excellent occasion to fit (almost) the entire team in a single photo
IPA: An Information-Reconstructive Input Projection Framework for Efficient Foundation Model Adap...
Yuan Yin, Shashanka Venkataramanan, Tuan-Hung Vu, Andrei Bursuc, Matthieu Cord
Action editor: Ofir Lindenbaum
https://openreview.net/forum?id=aLmQeZx2pR
#projector #adaptation
The unreasonable magic of simplicity!
Meet DrivoR (Driving on Registers): our latest end2end autonomous driving model.
We teared down complex dependencies & modules from current models to
obtain a pure Transformer-based SOTA driving agent (NAVSIM v1 & v2, HUGSIM).
Find out more π
7/ π Read the paper & get the code: valeoai.github.io/driving-on-r...
Congratulations to the whole team!
6/ Furthermore, this scoring architecture allowed us to tweak the agent's behavior.
We were able to induce a more passive, safer driving styleβwhich proved important for reaching SOTA performance on the rigorous NAVSIM-v2 benchmark. π‘οΈ
5/ Given the success of trajectory scoring methods (like GTRS), we dove deep into the scoring module.
Thanks to the wizardry of Yihong Xu, we discovered that disentangling the tokens used for generation from those used for scoring was key.
4/ This mimics human driving intuition! π§
We pay max attention to the road ahead (front camera), while only occasionally glancing at the rear (back camera).
Visualizing the attention maps confirms this: front tokens specialize; back tokens collapse to a single pattern.
3/ These registers act as "scene-tokens" and demonstrate signs of learned compression.
Cosine similarity analysis reveals high differentiation for the front camera, while representations progressively "collapse" as we move toward the back camera.
2/ We explored specific reasons to use a pre-trained ViT as image encoder.
We imbue DINOv2 with registers LoRA-finetuned on driving data, reducing the # of patch tokens over 250x using camera aware register tokens.
This efficiency could impact future works on VLMs in driving
1/π§΅ Q: Can we have both a simple and SOTA architecture in autonomous driving?
R: Yes! π
Introducing Driving on Registers (DrivoR):
a pure Transformer backbone that achieves SOTA results in NAVSIM v1 / v2 and closed-loop HUGSIM evaluation.
Here is how π
Our @spyrosgidaris.bsky.social is speaking this morning (Wed, Dec 10th, 11:00 am Paris time) about "Latent Representations for Better Generative Image Modeling" in the Hi! PARIS - ELLIS monthly seminar.
The talk will be live-streamed: www.hi-paris.fr/2025/09/26/a...
Perfect timing for this keynote on open, re-purposable foundation models at #aiPULSE2025
@abursuc.bsky.social taking the stage this afternoon! π