Advertisement · 728 × 90

Posts by Aron Vallinder

Preview
Miraklernas höst av Michael Vallinder (Bok) "En blygrå dag i oktober 1953 sitter Margareta på färjan mellan Trelleborg och Sassnitz på väg till Polen. Hon bär på en obändig längtan efter det liv hon tror är hennes - ett liv som kommer a...

My dad’s debut novel is coming out soon! (in Swedish)

www.bokus.com/bok/97891899...

1 year ago 1 0 0 0

3.5yo has taken to the quasi-Moorean “Now I least expect it,” seemingly oblivious to its unassertability

1 year ago 1 0 0 0

Ah oops, my bad!

1 year ago 1 0 0 0

In Search of Lost Time—that way, you’d buy yourself a decent chunk of extra time

1 year ago 0 0 1 0
Claude Cooperates! Exploring Cultural Evolution in LLM Societies, with Aron Vallinder &Edward Hughes
Claude Cooperates! Exploring Cultural Evolution in LLM Societies, with Aron Vallinder &Edward Hughes YouTube video by Cognitive Revolution "How AI Changes Everything"

Had a great time talking about our work on cultural evolution and cooperation in LLMs with Nathan Labenz and Ed Hughes

Plenty work remaining in developing evals for cooperation—please get in touch if interested!

1 year ago 3 0 0 0
Preview
Cultural Evolution of Cooperation among LLM Agents Large language models (LLMs) provide a compelling foundation for building generally-capable AI agents. These agents may soon be deployed at scale in the real world, representing the interests of indiv...

Paper: arxiv.org/abs/2412.10270

1 year ago 0 0 0 0

This work was done as part of the @pibbssai fellowship. I'm hugely grateful for the opportunity and for the excellent mentorship of @edwardfhughes, without which this would never have happened

1 year ago 0 0 1 0

We see this as a first step toward a new class of LLM benchmarks, focused on the implications of LLM agent deployment for the cooperative infrastructure of society.

1 year ago 0 0 1 0
Advertisement
We plot the average final resources (y-axis) per generation (x-axis) for all five individual runs of each model. Note the different 𝑦 -axis scales. For Claude 3.5 Sonnet, average final resources vary substantially across runs, especially in later generations. All five runs of GPT-4o show average final resources declining across generations (although in absolute terms the change is tiny). Gemini 1.5 Flash behavior also varies substantially across runs, with several runs showing promising increases before a “cooperation crash”.

We plot the average final resources (y-axis) per generation (x-axis) for all five individual runs of each model. Note the different 𝑦 -axis scales. For Claude 3.5 Sonnet, average final resources vary substantially across runs, especially in later generations. All five runs of GPT-4o show average final resources declining across generations (although in absolute terms the change is tiny). Gemini 1.5 Flash behavior also varies substantially across runs, with several runs showing promising increases before a “cooperation crash”.

We also find substantial variation in behavior across different runs of the same model, suggesting a sensitive dependence on initial conditions.

1 year ago 0 0 1 0
We plot the average final resources across all agents (y-axis) per generation (x-axis) for three different models (Claude 3.5 Sonnet, Gemini 1.5 Flash, GPT-4o). Each curve averages 5 runs with distinct random seeds for the language models, and the standard error of the mean is shown by shading. There is reliable cultural evolution of cooperation across generations for Claude 3.5 Sonnet but not for Gemini 1.5 Flash or GPT-4o with our prompting strategy.

We plot the average final resources across all agents (y-axis) per generation (x-axis) for three different models (Claude 3.5 Sonnet, Gemini 1.5 Flash, GPT-4o). Each curve averages 5 runs with distinct random seeds for the language models, and the standard error of the mean is shown by shading. There is reliable cultural evolution of cooperation across generations for Claude 3.5 Sonnet but not for Gemini 1.5 Flash or GPT-4o with our prompting strategy.

We find substantial divergence in the evolution of cooperation across the models examined, as seen here in the average final scores after each generation.

1 year ago 0 0 1 0

Before the game, agents are prompted to create a strategy.

After 12 rounds, the best-performing 50% survive to the next generation.

When new agents in that generation create a strategy, the prompt includes the strategies of the survivors, enabling cultural transmission

1 year ago 1 0 1 0

Each round, players are randomly paired as donor and recipient. The donor gives up some amount and the recipient receives 2x.

Donors know how the recipient and others have previously behaved as donors, giving them reputation info that could support indirect reciprocity.

1 year ago 1 0 1 0

AI agents will soon be deployed at scale in the real world, but relatively little is known about the dynamics of multiple LLM agents interacting over many generations of iterative deployment. We investigated this by studying a Donor Game with cultural evolution.

1 year ago 0 0 1 0

Very excited to announce a new paper—Cultural Evolution of Cooperation Among LLM agents—coauthored with @edwardfhughes

We study whether LLM agents can develop cooperative norms when interacting with each other, and find considerable differences across models.

1 year ago 2 2 1 0
Preview
Research agenda - Global Priorities Institute The central focus of GPI is what we call ‘global priorities research’: research into issues that arise in response to the question, ‘What should we do with a given amount of limited resources if our a...

We’re excited to announce our new research agendas – for philosophy, economics and psychology – have now been published! You can read them here: globalprioritiesinstitute.org/research-age...

1 year ago 19 6 0 0
Advertisement

Plenty of interesting papers in this PNAS special feature on half a century of cultural evolution www.pnas.org/topic/565

1 year ago 4 2 0 0
OUT 1 AND ITS DOUBLE | Jonathan Rosenbaum

jonathanrosenbaum.net/2024/04/out-...

1 year ago 0 0 0 0

Out 1 has several hours of barely watchable experimental theatre rehearsals but is still one of my favorite films of all time

1 year ago 0 0 1 0

Lots of Westerns are of course concerned with institutional economics, e.g. The Man Who Shot Liberty Valance. Much of Jia Zhangke’s filmography deals with China’s economic development. Same for Edward Yang and Taiwan.

1 year ago 2 0 0 0
What Children Can Do That Large Language Models Cannot (Yet) - Study Journal Paper by Yiu et al (2023). They argue that LLMs and vision models should not be thought of as individual agents, but rather as new cultural

Interesting perspective on LLMs, though “yet” may indeed turn out to be the key word

2 years ago 3 0 0 0