Grgur Kovač (@kovacgrgur) Bsky

This work was heavily inspired my many amazing works such as:
www.nature.com/articles/s41...
arxiv.org/abs/2404.01413
arxiv.org/abs/2311.09807
arxiv.org/abs/2402.0704

4 months ago 0 0 0 0

P.S. This project wraps up my PhD research exploring how to leverage human sciences (psychology, cultural evolution) to better evaluate and understand LLMs.
I am now on the job market for EU-based remote roles in industry (LLM Researcher/Engineer). I’d love to connect! 👋

4 months ago 2 0 1 0

This was done with:
@kovacgrgur.bsky.social *,Jérémy Perez *, Remy Portelas, Peter Ford Dominey, @pyoudeyer.bsky.social
(*equal contribution)
In the FlowersTeam, INRIA

4 months ago 0 0 1 0

Caveat: Model collapse is a nascent field and studies currently make many assumptions wrt real world dynamics. Here we explore one assumption - homogeneity of data - but many more remain to be explored!

4 months ago 0 0 1 0

Implication: These two takeaways together imply that different internet domains could exhibit different collapse dynamics (pertaining to the data properties of that domain).

4 months ago 0 0 1 0

Finding 2: The effects are within-domain. For LLMs trained on multiple domains, drops in one domain (e.g. reddit) are influenced by that domain’s properties (e.g. reddit, not twitter/X or wikipedia), i.e. effects do not spill to other domains.

4 months ago 0 0 1 0

Finding 1: Human data properties influence collapse dynamics. Some human data properties (lexical diversity, gaussianity) are associated with bigger drops in both quality and semantic diversity of generated text, and some (quality, semantic diversity) with smaller drops.

4 months ago 0 0 1 0

We used an iterative chain design (iteratively fine-tuning base LLMs on data generated by previously fine-tuned models).

We use regression analysis to find associations between human data properties and relative drops in quality and semantic diversity of LLM-generated data.

4 months ago 0 0 1 0

#LLMs are trained on internet data, which increasingly contains more synthetic data. These LLMs then generate new online data, which will be used to train future LLMs.

Will this closed loop result in future models generating data of lower quality and diversity (i.e. collapse)?

4 months ago 0 0 1 0

Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data? Large language models (LLMs) are increasingly used in the creation of online content, creating feedback loops as subsequent generations of models will be trained on this synthetic data. Such loops wer...

📄 Paper: arxiv.org/abs/2504.03814

4 months ago 0 0 1 0

Will the influx of synthetic data lead to uniform #ModelCollapse across the internet?
Our recent #EMNLP2025 (Oral) paper suggests a nuanced picture: different collapse dynamics might emerge in different internet domains based on the properties of human data in those domains! 🧵

4 months ago 1 1 1 0

What's wrong with evaluating #LLMs after a single interaction? Come find out @iclr-conf.bsky.social and learn how cultural attraction theory can help us do better. Poster #288, 10 am.

11 months ago 7 2 1 2

MAGELLAN: Metacognitive predictions of learning progress guide... Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM...

🚀 Introducing 🧭MAGELLAN—our new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.🌍✨Learn more: 🔗 arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL

1 year ago 9 3 1 4

x.com

The leaderboard is explained in our previous tweet (haven't transferred it to Bluesky yet) 😐:
x.com/KovacGrgur/s...

1 year ago 0 0 0 0

LLama 3.3 is great, but Nemotron is still the leader in our StickToYourRole Leaderboard !
Nemotron 🥇
Llama 3.3 🥈

huggingface.co/spaces/flowe...

1 year ago 4 0 1 0

I'm excited to announce that this work has been accepted at
@blog.neurips.cc.web.brid.gy 🧠🤖 We hope to spark conversations on goal selection in biological and artificial agents.

Check it out at openreview.net/forum?id=Gbq...

With Cédric Colas, Pierre-Yves Oudeyer, & Anne Collins

1 year ago 15 6 1 1

🚨New preprint🚨
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !

1 year ago 9 4 1 0

Posts by Grgur Kovač