Advertisement · 728 × 90

Posts by Lance Ying

Lastly, thanks to all my collaborators: @heyodogo.bsky.social Prafull Sharma @kyzhao-ivy.bsky.social @nacloos.bsky.social Kelsey Allen, Tom Griffiths @cocoscilab.bsky.social , Katie M Collins, José Hernández-Orallo, @phillipisola.bsky.social @gershbrain.bsky.social and Josh Tenenbaum

1 month ago 2 0 0 0
AI GameStore - Scalable Evaluation of Machine Intelligence A benchmark platform for evaluating AI agents across browser-based games.

We have released 10 games to the public. Play them or build agents to solve them on our website (aigamestore.org).

1 month ago 1 0 1 0
Preview
AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conven...

Please check out our paper on Arxiv: arxiv.org/abs/2602.17594

1 month ago 1 0 1 0

In its current state, the AI GameStore is still relatively primitive, and we are actively expanding it to include more diverse and challenging games.

Yet, we hope it serves as an example and a catalyst for building more general, scalable, open-ended evaluation for machine general intelligence.

1 month ago 1 0 1 0
Post image

We also find that models tend to struggle with games that stress test memory, planning and world model learning.

Additionally, model performance tends to be lower on games that challenge multiple capabilities simultaneously.

1 month ago 1 0 1 0
Post image

As a proof of concept, we generated 100 such games based on the top charts of Apple App Store and Steam

Most frontier VLMs today struggle to make progress on these games, reaching less than 10% of human median score for the majority of the games while taking significantly longer to complete.

1 month ago 1 0 1 0
Post image

Taking a first step towards this vision, we introduce the AI GameStore, a scalable and open-ended platform that uses LLMs with humans-in-the-loop to automatically construct standardized and containerized variants of popular human games on digital gaming platforms for AI Evaluation.

1 month ago 1 0 1 0
Post image

The Multiverse of Human Games cover nearly all human skills and interests, and serve as vital cultural artifacts: by abstracting and containerizing real-world complexities, humanity has collectively created a curriculum for training and preparing individuals for surviving in the open world.

1 month ago 1 0 1 0

The proposition that the set of all conceivable human games serves as a robust proxy for general intelligence is rooted in the teleology of play itself: humans design, engage in, and propagate games to prepare themselves for the multifaceted challenges that they are likely to encounter.

1 month ago 1 0 1 0

We define a “human game” to be a game designed by humans for humans, and argue for the evaluative suitability of this space of all such games people can imagine and enjoy -- the Multiverse of Human Games

1 month ago 1 0 1 0
Advertisement
Post image

Today we present a new framework for measuring human-like general intelligence in machines: studying how and how well they play and learn to play all conceivable human games compared to humans. We then propose the AI Gamestore a way to sample from popular human games to evaluate AI models.

1 month ago 20 7 1 0
Post image

How do people flexibly integrate visual & textual information to draw mental inferences about agents we've never met?

In a new paper led by @lanceying.bsky.social, we introduce a cognitive model that achieves this by synthesizing rational agent models on-the-fly -- presented at #EMNLP2025!

5 months ago 28 8 2 0