Advertisement · 728 × 90

Posts by CLAUSE - Computational Linguistics @ Bielefeld University

Preview
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data Jaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galvan-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prevot,...

And on Friday at 11am, BabyBabelLM (aclanthology.org/2026.eacl-lo...), to which our very own @bbunzeck.bsky.social contributed the German, Polish and various multilingual datasets, will be presented by @jumelet.bsky.social from @gronlp.bsky.social! 🥳

3 weeks ago 4 3 0 0
Preview
Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets Omar Momen, Emilie Sitter, Berenike Herrmann, Sina Zarrieß. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2026.

On Thursday at 11am, Omar Momen will give a talk on "Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets" (Omar Momen, Emilie Sitter, Berenike Herrmann, Sina Zarrieß, aclanthology.org/2026.eacl-lo...)!

3 weeks ago 4 1 1 0
Preview
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data Jaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galvan-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prevot,...

And on Friday at 11am, BabyBabelLM (aclanthology.org/2026.eacl-lo...), to which our very own @bbunzeck.bsky.social contributed the German, Polish and various multilingual datasets, will be presented by @jumelet.bsky.social from @gronlp.bsky.social! 🥳

3 weeks ago 4 3 0 0
Preview
Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets Omar Momen, Emilie Sitter, Berenike Herrmann, Sina Zarrieß. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2026.

On Thursday at 11am, Omar Momen will give a talk on "Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets" (Omar Momen, Emilie Sitter, Berenike Herrmann, Sina Zarrieß, aclanthology.org/2026.eacl-lo...)!

3 weeks ago 4 1 1 0

Before we forget: of course, CLAUSE Bielefeld is also at @eaclmeeting.bsky.social 2026 in Rabat/Marocco!

Our group is involved in a main conference papers:

3 weeks ago 2 2 1 0
Stefan Hartmann presenting a slide

Stefan Hartmann presenting a slide

Happening right now — @stefanhartmann.bsky.social presenting an extremely interesting case study on snowclones like »x is the new y«. 🗣️

2 months ago 13 1 0 1

Tomorrow!

3 months ago 7 1 0 0
Post image

I have just returned from a week-long visit to Bielefeld University! Thank you very much for hosting me Sina Zarrieß and @ozgealacam.bsky.social 😊 @clausebielefeld.bsky.social

3 months ago 8 2 1 0
Post image

This week we’re having @ecekt.bsky.social as our guest in Bielefeld. She gave a highly timely talk on language+vision models, how they process images under noise conditions, and about how to train a highly effective multimodal BabyLM with model merging. 🗣️👀💻

3 months ago 12 1 0 1
Advertisement
Post image

For years since the GPT-2 paper, emergent in-context learning (ICL) from 'next-token' training has been treated as something deeply tied to 𝐡𝐮𝐦𝐚𝐧 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞. But … is it?

5 months ago 2 2 1 1
AI generated image

AI generated image

Am I evil? Am I likeable?

Need a 10 minutes break? Like Fantasy? Loath it? Take part in our study and help us by rating images of fictional characters here:
bixprag.lili.uni-bielefeld.de/publix/0aSWK...

5 months ago 2 5 0 0
Post image

For this week’s group colloquium, we invited Loulou Kosmala from Paris-Est Créteil University. She gave a talk on multimodal feedback during all types of conversation, from real life to virtual, from learners to adults, from L1 to L2, and more! 🤩

5 months ago 3 0 0 0
Dialogue Is Not Enough to Make a Communicative BabyLM
(But Neither Is Developmentally Inspired Reinforcement Learning)
Francesca Padovani1∗ Bastian Bunzeck2∗ Manar Ali2 Omar Momen2
Arianna Bisazza1 Hendrik Buschmeier2 Sina Zarrieß2
1Center for Language and Cognition (CLCG), University of Groningen
2CRC 1646 – Linguistic Creativity in Communication, Bielefeld University
f.padovani@rug.nl bastian.bunzeck@uni-bielefeld.de

Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning) Francesca Padovani1∗ Bastian Bunzeck2∗ Manar Ali2 Omar Momen2 Arianna Bisazza1 Hendrik Buschmeier2 Sina Zarrieß2 1Center for Language and Cognition (CLCG), University of Groningen 2CRC 1646 – Linguistic Creativity in Communication, Bielefeld University f.padovani@rug.nl bastian.bunzeck@uni-bielefeld.de

As part of this year's BabyLM challenge, we (researchers from @gronlp.bsky.social and @clausebielefeld.bsky.social diverged from established pretraining paradigm by training only on dialogue data from CHILDES.

5 months ago 16 3 1 0

Preprint alert! We release BabyBabelLM, a multilingual benchmark of developmentally plausible training data. I was responsible for German and Polish data as well as various child-directed wikis. Immensely rewarding project with exceptionally cool co-authors. 🥳🚀

6 months ago 11 3 0 1
Post image

𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪

Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉

arxiv.org/abs/2510.10159

6 months ago 42 16 2 1

Happening in an hour! 🥳

6 months ago 1 0 0 0
Advertisement

If you are at #IWCS, then you should not miss Sanne‘s talk ”Not Just Who or What: Modeling the Interaction of Linguistic and Annotator Variation in Hateful Word Interpretation“ (Sanne Hoeken, Özge Alacam, Dong Nguyen, Massimo Poesio, Sina Zarrieß), tomorrow at 16:30! 🕟
@sannehoeken.bsky.social

6 months ago 4 1 0 1
Sina in front of a slide with different size circles

Sina in front of a slide with different size circles

Sina Zarieß is giving the KONVENS keynote on training BabyLMs #nlproc
The slide shows the number of words a 12yo human has seen in their lifetime compared to the numbers of words typical language models have seen in training #llm

7 months ago 6 3 0 0
Post image

Happening now: Sina‘s keynote on our BabyLM work. 🥳

7 months ago 5 0 0 1
Post image

Great first day at #KONVENS2015 today. Looking forward to another engaging day with a keynote by Sina Zarrieß tomorrow 🤓
@clausebielefeld.bsky.social

7 months ago 2 1 1 0

Don’t miss Sina‘s keynote on BabyLMs at #konvens tomorrow!

7 months ago 3 0 0 0
Post image

Final Keynote of #semdial by David Schlangen on ”Meaningful Interaction with Unreal Speakers?“ 😇💬

7 months ago 2 0 1 0

Final day at #semdial2025 #bialogue — four more presentations, one key note and hopefully many engaging discussions. Let's go!

7 months ago 0 1 0 0
Post image

Second #semdial keynote by Robert Hawkins on ”Foraging for common ground“

7 months ago 3 0 0 0
Post image

Day 2 of #semdial starts with a session on LMs and dialogue systems 🤩

7 months ago 3 0 0 0
Post image

Actually yes! Dialogue differs distinctly from monologues in terms of phonetic features and in the production of novel phonetic forms!

7 months ago 2 0 0 0
Advertisement
Post image

Leonie Schade asks whether it takes two to do an articulatory tango 😁

7 months ago 6 1 1 0

And the second talk features contributions by our PI Sina Zarrieß. 🤩

7 months ago 6 0 1 0

#semdial has begun 💬

7 months ago 1 0 0 0
Post image

#semdial is about to begin 🥳

7 months ago 2 2 1 0