Hi all!
We are curating SLAyiNG, a dataset of queer slang. To ensure the quality of the final data, we are asking the community for help with annotation.
Sign up at: docs.google.com/forms/d/e/1F...
If you have further inquiries, feel free to contact either me or @leahirlimann.bsky.social directly π
Posts by CIS, LMU Munich
Talk @ LowResLM: Beyond the standard: NLP for low-resource language varieties
by @barbaraplank.bsky.social
Sun 29 Mar - S. Le Lamrissa 14:00-15:00
loreslm.github.io/program
VarDial: Workshop on NL for similar languages, varieties and dialects
Yves Scherrer, NoΓ«mi Aepli, @verenablaschke.bsky.social , Tommi Jauhiainen, Nikola LjubeΕ‘iΔ, Preslav Nakov, JΓΆrg Tiedemann, Marcos Zampieri
Sun 29 Mar - S. Le Chellah 9:00-12:30
bsky.app/profile/vere...
Panel discussion on teaching NLP @ TeachingNLP
@ivanhabernal.bsky.social, Mausam, Hinrich SchΓΌtze , @barbaraplank.bsky.social
Sun 29 Mar - S. La Palmeraie @ 9:30-10:30
sites.google.com/view/teachin...
Talk @ AfricaNLP: The emergence or multilingual representations: Tracing linguistic capabilities during language model pretraining
by @barbaraplank.bsky.social
Sat 28 Mar - S. Le Lixus @ 11:20-12:00
sites.google.com/view/african...
Controlling reading ease with gaze-guided text generation
[Poster]: Fri 27 Mar - Poster Hall @ 9:00-10:30
@saeub.bsky.social , Darja Jepifanova,
@Diego Frassinelli, @barbaraplank.bsky.social
arxiv.org/abs/2601.17781
If probable, then acceptable? Understanding conditional acceptability judgments in large language models
[Oral]: Thu 26 Mar - S. Le Lixus @ 09:00-10:30
Jasmin Orth, @pmondorf.bsky.social , @barbaraplank.bsky.social
aclanthology.org/2026.eacl-lo...
A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages
[Poster]: Wed 25 Mar - Poster Hall @ 16:30-18:00
@raoyuan.bsky.social , Yihong Liu, Hinrich SchΓΌtze, @mhedderich.bsky.social
bsky.app/profile/raoy...
When meanings meet: Investigating the emergence and quality of shared concept spaces during multilingual language model training
[Oral]: Wed. 25 Mar - S. La Palmeraie @ 16:30-18:00
@fkoerner.bsky.social , @mxij.me , Anna Korhonen , @barbaraplank.bsky.social
aclanthology.org/2026.eacl-lo...
Too open for opinion? Embracing open-endedness in large language models for social simulation
[Oral]: Wed. 25 Mar - S. Walil @ 14:30-16:00
@boleima.bsky.social , @yongcao.bsky.social , Indria Sen, Anna-Carolina Haensch, Frauke Kreuter, @barbaraplank.bsky.social , @danielhers.bsky.social
Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
[Oral]: Wed. 25 Mar - S. Le Riad @ 11:30-13:00
@pedrohluzaraujo.bsky.social , @mhedderich.bsky.social , @amodarressi.bsky.social , Hinrich SchΓΌtze, Benjamin Roth
bsky.app/profile/pedr...
Going to Rabat for #EACL2026? So are we! π²π¦
We are bringing a packed schedule of papers, talks, and workshops.
Check out our lineup below and come say hi! π π§΅
#NLProc @eaclmeeting.bsky.social
π’ Life update π’
After a wonderful time at @ai2.bsky.social, I've joined @cislmu.bsky.social at @lmu.de as a tenure-track assistant professor in NLP. Thrilled to be back in Europe and to start a lab in Munich's flourishing AI ecosystem! π
Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken
At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.
Iβll be at @icmlconf.bsky.social next week presenting NoLiMa!
Poster on Tue July 15, 4:30β7pm (E-2312).
Happy to grab a coffee and chat about long-context, memory, research, or just to catch up.
Iβll be in Toronto for a couple of days after the conference, let me know if youβre around!
New paper: How does pretraining on programming languages + English shape LLMs' concept space?
π Do LLMs use English or a programming language as a kind of pivot language?
π§ Are neurons language-specific or shared across programming languages and English?
π arxiv.org/abs/2506.01074
π Collapse of Dense Retrievers
Accepted to #ACL2025 main conference ππ
In this paper we uncover major vulnerabilities in dense retrievers like Contriever, showing they favor:
π Shorter docs
π Early positions
π Repeated entities
π Literal matches
...all while ignoring the answer's presence!
π¨οΈ Beyond βnoisyβ text: How (and why) to process dialect data
π Keynote talk at WNUT @ NAACL
π₯ @verenablaschke.bsky.social
π Workshop on noisy and user-generated text (May 3)
The full workshop programme is here: noisy-text.github.io/2025/
bsky.app/profile/vere...
π Privacy-Preserving Federated Learning for Hate Speech Detection
π We present a federated learning system with differential privacy and fine-tuned ALBERT models for low-resource hate speech detection.
π₯ Ivo JΓΊnior, @htyeh1, Axel Wisiorek, @HinrichSchuetze
π SRW - Long
π Linguistic Features in German BERT: The Role of Morphology, Syntax, and Semantics in Multi-Class Text Classification
π Analysis of linguistic features used by German BERT in a classification task.
π₯ Henrike Beyer (University of Dundee), Diego Frassinelli
π SRW - Short
π XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
π a simple yet effective method to retrieve cross-lingual few-shot examples for multilingual in-context learning
π₯ @lpq29743, @andre_t_martins, @HinrichSchuetze
π arxiv.org/abs/2405.05116
π Finding - Short
π Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
π We predict speech-to-text model performance on dialect continua with geostatistics.
π₯ Ryan Soh-Eun Shim, Barbara Plank
π arxiv.org/abs/2410.14589
πFindings - Long
π A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
πAn investigation of the impact of parallel corpora, ... on the performance of multilingual LLMs.
π₯ @lpq29743, @andre_t_martins, @HinrichSchuetze
π arxiv.org/abs/2407.00436
πFinding - Long
π₯³ We are happy to share that CIS will be presenting 6 papers and talks at #NAACL2025!
Find out about each of them below in the π§΅
On my way to #NAACL2025 where I'll give a keynote at the noisy text workshop (WNUT), presenting some of the challenges & methods for dialect NLP + also discussing dialect speakers' perspectives!
π¨οΈ Beyond βnoisyβ text: How (and why) to process dialect data
ποΈ Saturday, May 3, 9:30β10:30