Advertisement · 728 × 90

Posts by Seth Karten

Huge money in providing an API to directly get bibtex formatted citations and would fix this issue easily

Gscholar doesnt allow this
Semantic scholar rate limits

Who is building this?

2 weeks ago 2 0 0 0

I dont even considered this being scooped. It just makes me mad

3 weeks ago 1 0 0 0

New meta seems to be arxiving a rough draft so that you can claim the terminology first and claim to be first

3 weeks ago 1 0 1 0

Everyone wants to own their own data but no one wants to own their own data center

3 weeks ago 1 0 0 0

I was going to check out their claims of being “unbiased”. This doesnt bode well

3 weeks ago 1 1 0 0
Post image

1) what

3 weeks ago 1 0 2 0

Can you make a guide explaining who should be going to which location?

3 weeks ago 0 0 1 0
Preview
International Summer School on AI and Games 2026

school.gameaibook.org

4 weeks ago 1 0 0 0

Agent creativity is one of the most important safety concerns as we increasingly see agent societies emerge in various domains

1 month ago 1 1 0 0
Preview
The PokeAgent Challenge: Competitive and Long-Context Learning at Scale We present the PokeAgent Challenge, a large-scale benchmark for decision-making research built on Pokemon's multi-agent battle system and expansive role-playing game (RPG) environment. Partial observa...

arxiv.org/abs/2603.15563

1 month ago 1 1 0 0
Advertisement
Preview
We Ran the Largest AI Pokemon Tournament Ever. Now It's an Open Benchmark. In 2025, everyone was talking about LLMs playing Pokemon.

open.substack.com/pub/sethkart...

1 month ago 3 1 1 0

i owe bluesky a post soon. everyone please hold on

1 month ago 2 0 1 0
Post image

🚨New preprint! LLM teams are being deployed at scale, yet we lack the tools to predict when they’ll succeed, fail, or how to design them. Distributed computing faced the exact same questions and figured out how to answer them. We show those insights apply directly to LLMs 🧵👇

1 month ago 30 3 1 1

We are planning to open source unique envs (pokejax and tcg jax) since the other envs already exist optimized

1 month ago 0 0 1 0
Preview
Automatic Generation of High-Performance RL Environments Translating complex reinforcement learning (RL) environments into high-performance implementations has traditionally required months of specialized engineering. We present a reusable recipe - a generi...

Automatic Generation of High-Performance RL Environments
arxiv.org/abs/2603.12145

1 month ago 1 1 1 0
Preview
We Automated RL Environment Engineering for $10 RL environment simulation eats 50-90% of training wall-clock for specialist RL policies. Coding agents can translate them automatically with no sim-to-sim gap.

open.substack.com/pub/sethkart...

1 month ago 6 4 2 0

I think I accidentally stumbled upon engagement baiting from first principles

Ill stay on bluesky as long as the 10 accounts I like to see still post here

3 months ago 1 0 0 0

You should make one of those github repos called
Awesome-Multi-agent-Papers
Because this looks like a solid list

3 months ago 3 0 0 0
Advertisement

I do appreciate you for making this contribution, but it is hard to compete with other platforms that do it centralized

3 months ago 1 0 0 0

I already use it and it doesnt solve the issue with bugs, lack of discoverability, lack of useful recommendation, and otherwise a worse experience than X or even linkedin

3 months ago 0 0 1 0
Post image

I think I might leave bluesky tbh

3 months ago 1 0 1 1

Blanket use of LLMs should not decrease significance of results. I am distrusting of any researcher that would not use their own product

4 months ago 1 0 0 0

Vote em out

4 months ago 1 0 0 0

Source: x.com/nxthompson/s...

4 months ago 1 0 0 0
Post image

Personally I am worried about this effect in disclosure

4 months ago 5 0 3 0
Flyer for The PokeAgent Challenge at NeurIPS 2025. Sunday, Dec 7, 8–10:45 AM PST, Mezzanine Room 15AB, San Diego Convention Center. Two tracks: Track 1 (Battling) features competitive Pokémon battle bots; Track 2 (Speedrunning) features long-horizon RPG gameplay. Tagline: "How do we close the gap between specialist RL models and generalist LLM agents?" Speakers: Seth Karten (Princeton), Aaron Traylor, Minmin Chen (Google DeepMind), Jake Grigsby (UT Austin), Stephanie Milani (NYU/Johns Hopkins), Kiran Vodrahalli (Google DeepMind), Fei Fang (CMU), Yuke Zhu (UT Austin), Chi Jin (Princeton). Sponsored by Google DeepMind.

Flyer for The PokeAgent Challenge at NeurIPS 2025. Sunday, Dec 7, 8–10:45 AM PST, Mezzanine Room 15AB, San Diego Convention Center. Two tracks: Track 1 (Battling) features competitive Pokémon battle bots; Track 2 (Speedrunning) features long-horizon RPG gameplay. Tagline: "How do we close the gap between specialist RL models and generalist LLM agents?" Speakers: Seth Karten (Princeton), Aaron Traylor, Minmin Chen (Google DeepMind), Jake Grigsby (UT Austin), Stephanie Milani (NYU/Johns Hopkins), Kiran Vodrahalli (Google DeepMind), Fei Fang (CMU), Yuke Zhu (UT Austin), Chi Jin (Princeton). Sponsored by Google DeepMind.

How do we close the gap between specialist RL and generalist LLM agents?

We're benchmarking it in Pokémon. Join us at the PokeAgent Challenge competition workshop @ NeurIPS 2025.

📍 Dec 7, 8AM
🎮 Track 1: Competitive Pokémon (game-theoretic reasoning)
🗺️ Track 2: Speedrunning (long-horizon planning)

4 months ago 4 3 0 0

Best account to aggregate MAS research

4 months ago 0 0 0 0

The assumption is not that bad. Additionally it is not a hard threshold so the methods will scale as models get better

4 months ago 1 0 1 0
Advertisement

EC is partially solved with foundation models. The social settings arent and the LLM Economist takeaways are going to be very practical moving forward. If you have aligned agents, many multi-agent problems become simple optimization problems. You just need to train with a scaffold like claude code

4 months ago 1 0 1 0

These are pretty cool.. but i guess nothing ever happened with it? I like the jersey city uber eats robots a lot too

But we should still be building and deploying things here 100x faster

4 months ago 1 0 1 0