LLMs can retrieve knowledge — but can they connect it in *creative* ways to solve problems?
Introducing CresOWLve 🦉, a new benchmark that evaluates creative problem-solving over real-world knowledge, using puzzles that require multiple creative thinking strategies.👇
Posts by Antoine Bosselut
1/ 🌍 How does mixing data from hundreds of languages affect LLM training?
In our new paper "Revisiting Multilingual Data Mixtures in Language Model Pretraining" we revisit core assumptions about multilinguality using 1.1B-3B models trained on up to 400 languages.
🧵👇
🎤 Prof. Iryna Gurevych, Distinguished Professor at the National Center for Cybersecurity @athenecenter.bsky.social and Director of the UKP Lab at @tuda.bsky.social, delivered a keynote on “How to Make AI-Native Internet Content Secure? Coping with Synthetic and Misleading Data.”
🎉 Congratulations to Assistant Professors @abosselut.bsky.social (IC), @bunnech.bsky.social (IC & SV), and @mschrimpf.bsky.social (IC & SV) for being selected as #AI2050 Early Career Fellows by @schmidtsciences.bsky.social !
🔗 Full article: actu.epfl.ch/news/epfl-pr...
Recruiting PhDs & postdocs for:
🤖 agents "taking over" science (hypogenic.ai and 📌)
🧪 Real scientists ➡️AI (e.g., materials, chem, physics)
📜 Theory + incentives for H-AI collab & credit (e.g., formalizing tacit knowledge)
new adventures for me, 🔄 if you can! 🙌
chenhaot.com/recruiting.h...
EPFL AI Center Postdocs: www.epfl.ch/research/fun...
NLP Lab Postdoc: docs.google.com/document/d/1...
If you're interested in doing a postdoc at @icepfl.bsky.social , there's still time to apply for the @epfl-ai-center.bsky.social postdoctoral fellowships.
Apart from this, I'm also recruiting postdocs in developing novel training algorithms for reasoning models and agentic AI.
Join us again at #MELT workshop (520D) at #COLM2025 to hear from @ImanolSchlag about #Apertus, the largest multilingual LLM trained on over 1000 languages.
Kicking off #MELT workshop at #COLM2025 with Monojit Choudhury talking about "Meta-Cultural Competence: What LLMs Should Know About Culture to Serve the Next Billion Users" !
Come join us in 520D (all the way down the hall and around the corner) at #COLM2025 for the first workshop on multilingual and equitable language technologies!
Very happy this paper got accepted to NeurIPS 2025 as a Spotlight! 😁
Main takeaway: In mechanistic interpretability, we need assumptions about how DNNs encode concepts in their representations (eg, the linear representation hypothesis). Without them, we can claim any DNN implements any algorithm!
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).
We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
I don't see why the answer would be no but since you specifically say "October", what if we submitted to ARR in July and want to do early submission to ACL 2026 ?
1/🚨 New preprint
How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.
#interpretability
💡Can we optimize LLMs to be more creative?
Introducing Creative Preference Optimization (CrPO) and MuCE (Multi-task Creativity Evaluation Dataset).
Result: More novel, diverse, surprising text—without losing quality!
📝 Appearing at #EMNLP2025
Special thanks to everyone that participated in this journey!
(5) Transparency: We're fully open, pairing our weights with a full suite of reproduction artifacts.
Check out our artifacts and technical report here: huggingface.co/swiss-ai
(4) Multilinguality: We pretrain the model on 15T tokens from 1811 languages, and post-train with 3.8 M examples from 149 languages
(3) Memorization Prevention: Adopting the Goldfish objective, we suppress verbatim recall and reduce risks of memorization
(2) Data Compliance: we pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for copyrighted, non-permissive, toxic, and personally identifiable content
What makes Apertus special?
(1) Scale: Apertus-70B is the first fully open model to be trained at 70B parameter scale on 15T tokens, requiring us to scale out training to 4096 GPUs at
@cscsch.bsky.social
The next generation of open LLMs should be inclusive, compliant, and multilingual by design. That’s why we @icepfl.bsky.social @ethz.ch @cscsch.bsky.social ) built Apertus.
EPFL, @ethz.ch and the @cscsch.bsky.social released Apertus today, Switzerland’s first large-scale, open, multilingual language model — a milestone in generative AI for transparency and diversity.
Find out more here: ai.epfl.ch/apertus-a-fu...
@abosselut.bsky.social @icepfl.bsky.social
EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.
Read more: actu.epfl.ch/news/apertus...
Very happy to see that Pleias multilingual data processing pipelines have contributed to the largest open pretraining project in Europe.
From their tech report: huggingface.co/swiss-ai/Ape...
Die Schweiz steigt ins Rennen der grossen Sprachmodelle ein. Unter dem Namen #Apertus veröffentlichen @ethz.ch, @icepfl.bsky.social und das @cscsch.bsky.social das erste vollständig offene, mehrsprachige #LLM des Landes.
Fürs MAZ habe ich Apertus kurz analysiert:
www.maz.ch/news/apertus...
Thank you for your incredible work!
recently gave a talk on <Reality Checks> at two venues, and discussed (and rambled) about how leaderboard chasing is awesome (and we want it to continue) but that this isn't easy because everyone (me! me! me!) wants to write more papers.
the link to the slide deck in the reply.
🚨New Preprint!
In multilingual models, the same meaning can take far more tokens in some languages, penalizing users of underrepresented languages with worse performance and higher API costs. Our Parity-aware BPE algorithm is a step toward addressing this issue: 🧵