Advertisement Β· 728 Γ— 90

Posts by Momchil Tomov

Post image

Hello all! πŸ‘‹ 🚨 New Preprint Alert! 🚨

Code World Models for General Game-Playing. β™ŸοΈπŸŽ² ♣️β™₯️♠️♦️

I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments!

🧡 1/N

6 months ago 55 9 2 4

Yes! I'm putting out the ad for that early next year. Let's get in touch.

4 months ago 1 0 1 0
Post image

Here are several examples of real-world cut-ins. TreeIRL anticipates the cut-in and brakes comfortably, while the other baselines either brake too late or brake uncomfortably (see inset history of vehicle kinematics).

7 months ago 3 0 0 0
Post image

Tree achieves 1-2 orders of magnitude improvement in safety, while also improving comfort and progress! On the road, it is by far the best planner.

7 months ago 3 0 1 0

We compare TreeIRL against multiple classical and SOTA planners in 7000+ nuPlan simulations. But the most exciting result is from deploying and evaluating the planners on real self-driving cars in Las Vegas.

7 months ago 3 0 2 0
Post image

We feed the MCTS trajectories into a deep scoring function trained with IRL to choose the most human-like among them.

The IRL network is trained on many hours of human export demonstrations to effectively reverse-engineer the intrinsic reward function of human driving.

7 months ago 3 0 1 0
Post image

MCTS uses search + ML to efficiently explore combinatorially large search spaces. In most applications (e.g. AlphaGo), MCTS outputs a single next best action.

The main innovation is to reporpose MCTS to ouput a *set of possible sequences* of actions (i.e., trajectories).

7 months ago 3 0 1 0

Why it matters (cont'd):

🧩 Flexible framework that can be extended with imitation learning and reinforcement learning.

‼️ Underscores importance of diverse metrics and real-world evaluation.

7 months ago 3 0 1 0

Why this matters:

πŸ›£οΈ First real-world evaluation of MCTS-based planner on public roads.

πŸ“Š Comprehensive comparison across simulation and **500+ miles of urban driving** in Las Vegas.

πŸ† Beats classical + SOTA planners, balancing safety, progress, comfort, and human-likeness.

7 months ago 3 0 1 0
Preview
TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning We present TreeIRL, a novel planner for autonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation a...

πŸ’‘The key idea is to use Monte Carlo tree search (MCTS) to find a promising set of safe candidate trajectories and inverse reinforcement learning (IRL) to choose the most human-like trajectory among them.

Read the full paper here --> arxiv.org/abs/2509.13579

7 months ago 4 0 1 0
Advertisement
Post image

Excited to share a new preprint based on my work this past year:

**TreeIRL** is a novel planner that combines classical search with learning-based methods to achieve state-of-the-art performance in simulation and in **real-world autonomous driving**! 🚘 πŸ€– πŸš€

7 months ago 27 6 1 0
Preview
Competitive integration of time and reward explains value-sensitive foraging decisions and frontal cortex ramping dynamics Bukwich and Campbell et al. show that mice integrate elapsed time and reward intake, scaled by a latent patience variable, to decide when to leave virtual β€œpatches.” Frontal cortex ramping activity ma...

Our paper on foraging is now published in Neuron! Read it here:

www.cell.com/neuron/fullt...

This project was co-led by Michael Bukwich (not on Bluesky) and me, with major contributions from all co-authors. Huge thanks to the whole team!

8 months ago 83 22 2 1