Lastly, thanks to all my collaborators: @heyodogo.bsky.social Prafull Sharma @kyzhao-ivy.bsky.social @nacloos.bsky.social Kelsey Allen, Tom Griffiths @cocoscilab.bsky.social , Katie M Collins, José Hernández-Orallo, @phillipisola.bsky.social @gershbrain.bsky.social and Josh Tenenbaum
Posts by Lance Ying
We have released 10 games to the public. Play them or build agents to solve them on our website (aigamestore.org).
In its current state, the AI GameStore is still relatively primitive, and we are actively expanding it to include more diverse and challenging games.
Yet, we hope it serves as an example and a catalyst for building more general, scalable, open-ended evaluation for machine general intelligence.
We also find that models tend to struggle with games that stress test memory, planning and world model learning.
Additionally, model performance tends to be lower on games that challenge multiple capabilities simultaneously.
As a proof of concept, we generated 100 such games based on the top charts of Apple App Store and Steam
Most frontier VLMs today struggle to make progress on these games, reaching less than 10% of human median score for the majority of the games while taking significantly longer to complete.
Taking a first step towards this vision, we introduce the AI GameStore, a scalable and open-ended platform that uses LLMs with humans-in-the-loop to automatically construct standardized and containerized variants of popular human games on digital gaming platforms for AI Evaluation.
The Multiverse of Human Games cover nearly all human skills and interests, and serve as vital cultural artifacts: by abstracting and containerizing real-world complexities, humanity has collectively created a curriculum for training and preparing individuals for surviving in the open world.
The proposition that the set of all conceivable human games serves as a robust proxy for general intelligence is rooted in the teleology of play itself: humans design, engage in, and propagate games to prepare themselves for the multifaceted challenges that they are likely to encounter.
We define a “human game” to be a game designed by humans for humans, and argue for the evaluative suitability of this space of all such games people can imagine and enjoy -- the Multiverse of Human Games
Today we present a new framework for measuring human-like general intelligence in machines: studying how and how well they play and learn to play all conceivable human games compared to humans. We then propose the AI Gamestore a way to sample from popular human games to evaluate AI models.
How do people flexibly integrate visual & textual information to draw mental inferences about agents we've never met?
In a new paper led by @lanceying.bsky.social, we introduce a cognitive model that achieves this by synthesizing rational agent models on-the-fly -- presented at #EMNLP2025!