Advertisement · 728 × 90

Posts by Matej Jusup

Introducing Kaggle Game Arena | Kaggle Watch models compete in complex games providing a verifiable and dynamic measure of their capabilities

N/N
More on Game Arena and the upcoming chess matchups:
🔗 www.kaggle.com/blog/introdu...

Excited to see more AI evaluations move in this direction.

8 months ago 1 0 0 0

4/N The first tournament? Chess, with top models like Gemini 2.5 Pro, o3, and DeepSeek-R1. Matches will be covered by Magnus Carlsen, Hikaru Nakamura, and Levy Rozman (GothamChess)

8 months ago 1 0 1 0

3/N
3. Resistance to benchmark saturation—many games remain unsolved by brute force or memorization
4. Strong emphasis on high-level behaviors: planning, reasoning, memory, adaptation, even deception

8 months ago 1 0 1 0

2/N By using head-to-head board game matchups, it offers several advantages over many existing evaluations:
1. Direct comparisons across a range of strategic games
2. Streamed, replayable matches that improve transparency and reproducibility

8 months ago 1 0 1 0

1/N I’ve long believed that board games should play a bigger role in AI evaluation. They naturally test strategic reasoning, long-term planning, adaptation—and they can’t be solved by brute force or memorization.

Game Arena is transparent, replayable, and tests actual behavioral intelligence.

8 months ago 3 1 1 0

A year after our trip to AAMAS in New Zealand, @sharky6000.bsky.social came back for more!

I should have planned my year not to miss @aamasconf.bsky.social…

Big congrats and keep up amazing work! 🎉👏

10 months ago 3 0 1 0
Preview
ML Pub Club #22: Superhuman Planning with LLMs · Luma What happens when a chess champion meets cutting-edge AI? Join us for an evening with Matej Jusup, as he unpacks how large language models (LLMs) can go from…

Looking forward to speaking at the ML Pub Club on June 3rd!

I'll discuss how, during my time at DeepMind, we taught LLMs to play chess at a GM level and the broader implications for strategic AI.

If you're in Zagreb, join us at Mažuranićev trg 13 at 6 PM!

More info & RSVP: lu.ma/erjji5it

11 months ago 3 0 0 0
Preview
Mastering Board Games by External and Internal Planning with Language Models Advancing planning and reasoning capabilities of Large Language Models (LLMs) is one of the key prerequisites towards unlocking their potential for performing reliably in complex and impactful domains...

A paper from my time at Google was accepted for a spotlight presentation at ICML!

In “Mastering Board Games by External and Internal Planning with Language Models”, we show how language models can achieve grandmaster-level play using a search budget on par with humans.

arxiv.org/abs/2412.12119

11 months ago 22 4 0 0
Post image

Hive (and all of its expansions) has been added to OpenSpiel! 🎉🤩🐝🐜🕷️🐞🦟🪲

From Gen42: "Hive is an award-winning board game with a difference. There is no board. The pieces are added to the playing area thus creating the board. As more and more pieces are added the game becomes a fight to ...

🧵1/5

11 months ago 14 3 1 2
TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science
TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science YouTube video by Amii

www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.

1 year ago 40 13 2 1
Advertisement
Video

Sim agents are key for developing autonomous systems for safety-critical systems, like self-driving cars.

We're open-sourcing sim agents that achieve a 99.8% success rate with < 0.8% failures on the Waymo Dataset. These agents are built through scaling self-play.

1 year ago 34 5 3 1

We've released our lecture notes for the course Probabilistic AI at ETH Zurich, covering uncertainty in ML and its importance for sequential decision making. Thanks a lot to @jonhue.bsky.social for his amazing effort and to everyone who contributed! We hope this resource is useful to you!

1 year ago 61 10 1 0

it won't, as far as i know. i will share the link here if anything changes

1 year ago 1 0 0 0
Preview
ZurichAI | Largest ML meetup in Switzerland ZurichAI is the largest regularly scheduled machine learning meetup in Switzerland. We're in Zurich and host events for NLP, CV & more with 100+ regular attendees.

Join the conversation! We'll cover:
• The innovative search strategies we developed
• The implications of LLMs in strategic domains
• Q&A and networking with fellow AI enthusiasts

🗓️ 20th Feb 2025, 18:00-20:00
📍 Zürich, OAT ETH Zurich (14th floor)
🔗 www.zurichai.ch

1 year ago 3 0 0 0
Post image

LLMs Mastering Board Games: ZurichNLP Meetup - Feb 20th!

Excited to share insights from my student research at Google DeepMind at the upcoming ZurichNLP meetup! I'll present how we achieved high-level play in board games using LLMs with a search budget comparable to human chess grandmasters.

1 year ago 15 3 2 0
Preview
Robust Autonomy Emerges from Self-Play Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic drivi...

I've been talking about writing this paper to anyone who would listen since 2020. I bombed a bunch of job talks trying to convince companies to work on this. It's so nice to finally just be able to say, yes, self-play RL in a diverse world gives you immense capabilities
arxiv.org/abs/2502.03349

1 year ago 92 6 3 0
Post image Post image

I am more than happy that @quantamagazine.bsky.social , which I have been reading since the first year of my Bachelor's degree, cited us:
www.quantamagazine.org/chatbot-soft...

More news about this work and 2nd version is coming soon!

#machinelearning #deeplearning #cs #computerscience #tcs

1 year ago 4 1 0 0

Pet peeve: Calling something that’s not open source… open source. Open weight != open source

1 year ago 30 3 1 1
Graphics fill of statistics on the efficiency or in efficiency of cars

Graphics fill of statistics on the efficiency or in efficiency of cars

A typical European car is parked 92% of the time. It spends 1/5th of its driving time looking for parking. Its 5 seats only move 1.5 people. 86% of its fuel never reaches the wheels, and most of the energy that does, moves the car, not the people.

Sound efficient?

HT @ellenmacarthurfdn.bsky.social

1 year ago 1092 407 30 29

An interesting idea that’s worth keeping an eye on!

1 year ago 0 0 0 0
Advertisement
Preview
2024: A year of extraordinary progress and advancement in AI As we move into 2025, we’re looking back at the astonishing progress in AI in 2024.

Demis Hassabis, James Manyika, and I wrote up an overview of the AI research work & advances across Google in 2024 (Gemini, NotebookLM, robotics, ML for science, & advances in responsible AI+more). 🎊

Given it a read or paste it into NotebookLM to listen, if you prefer!

blog.google/technology/a...

1 year ago 125 23 2 0
Post image

Check out the 16th Workshop on Optimization and Learning in Multiagent Systems (OptLearnMAS-25) at #AAMAS 2025!

Topics: distributed opt., coalition formation, opt. under uncertainty, winner determination algs in auctions and procurements, algs to compute equilibria in games.

optlearnmas.github.io

1 year ago 11 2 0 0

In December, I posted about our new paper on mastering board games using internal + external planning. 👇

Here's a talk now on Youtube about it given by my awesome colleague John Schultz!

www.youtube.com/watch?v=JyxE...

1 year ago 35 11 1 0

John's talk is now available online!
www.youtube.com/watch?v=JyxE...

1 year ago 13 2 0 0
Post image

Join John's talk to get insights on our paper on mastering board games with language models!

1 year ago 6 1 0 1

Just a reminder that the AAMAS Doctoral Consortium deadline is next Friday!

Please consider submitting to this great venue or telling your students about it.

👇

1 year ago 8 3 0 0
Preview
29th BOŠNJACI Open • Round 1 9-round Swiss | 90 min + 30 sec / move | standard | Bošnjaci, Croatia | Saric, Culum, Pap, Zaja

The first round was a hard-fought win against a much lower-rated opponent, but it is a testament to the increased playing quality since the recent global chess boom!

lichess.org/broadcast/29...

1 year ago 1 0 0 0
Post image

After 15 years away from competitive chess, I forgot how much thrill and excitement the game gives! ♟️ I decided to attend a tournament with five grandmasters and numerous international, fide, and candidate masters.

@lichess.org broadcast: lichess.org/broadcast/29...

1 year ago 3 0 1 0

If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.

1 year ago 74 31 2 0
Advertisement

After a slight delay, it is now also out on arXiv: arxiv.org/abs/2412.12119

1 year ago 8 1 0 0