Hello all! π π¨ New Preprint Alert! π¨
Code World Models for General Game-Playing. βοΈπ² β£οΈβ₯οΈβ οΈβ¦οΈ
I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments!
π§΅ 1/N
Posts by Momchil Tomov
Yes! I'm putting out the ad for that early next year. Let's get in touch.
Here are several examples of real-world cut-ins. TreeIRL anticipates the cut-in and brakes comfortably, while the other baselines either brake too late or brake uncomfortably (see inset history of vehicle kinematics).
Tree achieves 1-2 orders of magnitude improvement in safety, while also improving comfort and progress! On the road, it is by far the best planner.
We compare TreeIRL against multiple classical and SOTA planners in 7000+ nuPlan simulations. But the most exciting result is from deploying and evaluating the planners on real self-driving cars in Las Vegas.
We feed the MCTS trajectories into a deep scoring function trained with IRL to choose the most human-like among them.
The IRL network is trained on many hours of human export demonstrations to effectively reverse-engineer the intrinsic reward function of human driving.
MCTS uses search + ML to efficiently explore combinatorially large search spaces. In most applications (e.g. AlphaGo), MCTS outputs a single next best action.
The main innovation is to reporpose MCTS to ouput a *set of possible sequences* of actions (i.e., trajectories).
Why it matters (cont'd):
π§© Flexible framework that can be extended with imitation learning and reinforcement learning.
βΌοΈ Underscores importance of diverse metrics and real-world evaluation.
Why this matters:
π£οΈ First real-world evaluation of MCTS-based planner on public roads.
π Comprehensive comparison across simulation and **500+ miles of urban driving** in Las Vegas.
π Beats classical + SOTA planners, balancing safety, progress, comfort, and human-likeness.
π‘The key idea is to use Monte Carlo tree search (MCTS) to find a promising set of safe candidate trajectories and inverse reinforcement learning (IRL) to choose the most human-like trajectory among them.
Read the full paper here --> arxiv.org/abs/2509.13579
Excited to share a new preprint based on my work this past year:
**TreeIRL** is a novel planner that combines classical search with learning-based methods to achieve state-of-the-art performance in simulation and in **real-world autonomous driving**! π π€ π