CIS, LMU Munich (@cislmu) Bsky

Hi all!
We are curating SLAyiNG, a dataset of queer slang. To ensure the quality of the final data, we are asking the community for help with annotation.
Sign up at: docs.google.com/forms/d/e/1F...
If you have further inquiries, feel free to contact either me or @leahirlimann.bsky.social directly 🌈

3 weeks ago 2 2 1 0

The Second Workshop on Language Models for Low-Resource Languages

Talk @ LowResLM: Beyond the standard: NLP for low-resource language varieties
by @barbaraplank.bsky.social
Sun 29 Mar - S. Le Lamrissa 14:00-15:00
loreslm.github.io/program

4 weeks ago 0 0 0 0

VarDial: Workshop on NL for similar languages, varieties and dialects
Yves Scherrer, Noëmi Aepli, @verenablaschke.bsky.social , Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri
Sun 29 Mar - S. Le Chellah 9:00-12:30
bsky.app/profile/vere...

4 weeks ago 1 0 1 0

TeachingNLP @ EACL 2026 The Seventh Workshop on Teaching NLP will be co-located with the 2026 Conference of the European Chapter of the Association for Computational Linguistics in Rabat, Morocco. The one-day workshop will c...

Panel discussion on teaching NLP @ TeachingNLP
@ivanhabernal.bsky.social, Mausam, Hinrich Schütze , @barbaraplank.bsky.social

Sun 29 Mar - S. La Palmeraie @ 9:30-10:30
sites.google.com/view/teachin...

4 weeks ago 0 1 1 0

AfricaNLP 2026 About the Workshop

Talk @ AfricaNLP: The emergence or multilingual representations: Tracing linguistic capabilities during language model pretraining
by @barbaraplank.bsky.social
Sat 28 Mar - S. Le Lixus @ 11:20-12:00
sites.google.com/view/african...

4 weeks ago 0 0 1 0

Controlling Reading Ease with Gaze-Guided Text Generation The way our eyes move while reading can tell us about the cognitive effort required to process the text. In the present study, we use this fact to generate texts with controllable reading ease. Our me...

Controlling reading ease with gaze-guided text generation
[Poster]: Fri 27 Mar - Poster Hall @ 9:00-10:30
@saeub.bsky.social , Darja Jepifanova,
@Diego Frassinelli, @barbaraplank.bsky.social
arxiv.org/abs/2601.17781

4 weeks ago 0 0 1 0

If probable, then acceptable? Understanding conditional acceptability judgments in large language models
[Oral]: Thu 26 Mar - S. Le Lixus @ 09:00-10:30
Jasmin Orth, @pmondorf.bsky.social , @barbaraplank.bsky.social
aclanthology.org/2026.eacl-lo...

4 weeks ago 0 0 1 0

A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages
[Poster]: Wed 25 Mar - Poster Hall @ 16:30-18:00
@raoyuan.bsky.social , Yihong Liu, Hinrich Schütze, @mhedderich.bsky.social
bsky.app/profile/raoy...

4 weeks ago 0 0 1 0

When meanings meet: Investigating the emergence and quality of shared concept spaces during multilingual language model training
[Oral]: Wed. 25 Mar - S. La Palmeraie @ 16:30-18:00
@fkoerner.bsky.social , @mxij.me , Anna Korhonen , @barbaraplank.bsky.social
aclanthology.org/2026.eacl-lo...

4 weeks ago 1 0 1 0

Too open for opinion? Embracing open-endedness in large language models for social simulation
[Oral]: Wed. 25 Mar - S. Walil @ 14:30-16:00
@boleima.bsky.social , @yongcao.bsky.social , Indria Sen, Anna-Carolina Haensch, Frauke Kreuter, @barbaraplank.bsky.social , @danielhers.bsky.social

4 weeks ago 0 0 1 0

Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
[Oral]: Wed. 25 Mar - S. Le Riad @ 11:30-13:00
@pedrohluzaraujo.bsky.social , @mhedderich.bsky.social , @amodarressi.bsky.social , Hinrich Schütze, Benjamin Roth
bsky.app/profile/pedr...

4 weeks ago 1 0 1 1

Going to Rabat for #EACL2026? So are we! 🇲🇦
We are bringing a packed schedule of papers, talks, and workshops.
Check out our lineup below and come say hi! 👋 🧵
#NLProc @eaclmeeting.bsky.social

4 weeks ago 4 2 1 0

📢 Life update 📢

After a wonderful time at @ai2.bsky.social, I've joined @cislmu.bsky.social at @lmu.de as a tenure-track assistant professor in NLP. Thrilled to be back in Europe and to start a lab in Munich's flourishing AI ecosystem! 🎉

1 month ago 29 1 2 0

Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken

At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.

8 months ago 16 4 1 1

I’ll be at @icmlconf.bsky.social next week presenting NoLiMa!
Poster on Tue July 15, 4:30–7pm (E-2312).

Happy to grab a coffee and chat about long-context, memory, research, or just to catch up.

I’ll be in Toronto for a couple of days after the conference, let me know if you’re around!

9 months ago 4 2 1 0

New paper: How does pretraining on programming languages + English shape LLMs' concept space?
🔍 Do LLMs use English or a programming language as a kind of pivot language?
🧠 Are neurons language-specific or shared across programming languages and English?
🔗 arxiv.org/abs/2506.01074

10 months ago 6 1 1 0

📄 Collapse of Dense Retrievers

Accepted to #ACL2025 main conference 🎉🎉

In this paper we uncover major vulnerabilities in dense retrievers like Contriever, showing they favor:
📌 Shorter docs
📌 Early positions
📌 Repeated entities
📌 Literal matches
...all while ignoring the answer's presence!

11 months ago 9 2 1 1

🗨️ Beyond “noisy” text: How (and why) to process dialect data
🔎 Keynote talk at WNUT @ NAACL
👥 @verenablaschke.bsky.social
📁 Workshop on noisy and user-generated text (May 3)
The full workshop programme is here: noisy-text.github.io/2025/
bsky.app/profile/vere...

11 months ago 2 1 0 0

📝 Privacy-Preserving Federated Learning for Hate Speech Detection
🔎 We present a federated learning system with differential privacy and fine-tuned ALBERT models for low-resource hate speech detection.
👥 Ivo Júnior, @htyeh1, Axel Wisiorek, @HinrichSchuetze
📁 SRW - Long

11 months ago 1 0 1 0

📝 Linguistic Features in German BERT: The Role of Morphology, Syntax, and Semantics in Multi-Class Text Classification
🔎 Analysis of linguistic features used by German BERT in a classification task.
👥 Henrike Beyer (University of Dundee), Diego Frassinelli
📁 SRW - Short

11 months ago 0 0 1 0

XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples Recent studies indicate that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving relevant in-context examples tailored to the input query, enhances few-shot in-context learning of...

📝 XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
🔎 a simple yet effective method to retrieve cross-lingual few-shot examples for multilingual in-context learning
👥 @lpq29743, @andre_t_martins, @HinrichSchuetze
🔗 arxiv.org/abs/2405.05116
📁 Finding - Short

11 months ago 0 0 1 0

Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum There is increasing interest in looking at dialects in NLP. However, most work to date still treats dialects as discrete categories. For instance, evaluative work in variation-oriented NLP for English...

📝 Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
🔎 We predict speech-to-text model performance on dialect continua with geostatistics.
👥 Ryan Soh-Eun Shim, Barbara Plank
🔗 arxiv.org/abs/2410.14589
📁Findings - Long

11 months ago 0 0 1 0

A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, an...

📝 A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
🔎An investigation of the impact of parallel corpora, ... on the performance of multilingual LLMs.
👥 @lpq29743, @andre_t_martins, @HinrichSchuetze
🔗 arxiv.org/abs/2407.00436
📁Finding - Long

11 months ago 1 0 1 0

🥳 We are happy to share that CIS will be presenting 6 papers and talks at #NAACL2025!
Find out about each of them below in the 🧵

11 months ago 10 0 1 1

On my way to #NAACL2025 where I'll give a keynote at the noisy text workshop (WNUT), presenting some of the challenges & methods for dialect NLP + also discussing dialect speakers' perspectives!

🗨️ Beyond “noisy” text: How (and why) to process dialect data
🗓️ Saturday, May 3, 9:30–10:30

11 months ago 27 7 1 1

Posts by CIS, LMU Munich