Advertisement ยท 728 ร— 90

Posts by MaiNLP lab, LMU Munich

MaiNLP talks/posters/events at EACL

MaiNLP talks/posters/events at EACL

MaiNLP is happy to be part of @eaclmeeting.bsky.social with several papers, talks, a panel, and a workshop โ˜€๏ธ Looking forward to seeing you in Rabat! #EACL2026

4 weeks ago 7 2 0 1
Post image

We are honoured to welcome Prof Barbara Plank (@barbaraplank.bsky.social ) from @mainlp.bsky.social @cislmu.bsky.social, as our keynote speaker.

LoResLM @eaclmeeting.bsky.social

5 months ago 7 2 0 0
Preview
Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance Expert persona prompting -- assigning roles such as expert in math to language models -- is widely used for task improvement. However, prior work shows mixed results on its effectiveness, and does not...

๐Ÿ“ข New paper accepted at @eaclmeeting.bsky.social
2026:

Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions

with
@mhedderich.bsky.social
@amodarressi.bsky.social
Hinrich Schuetze
& Benjamin Roth.

Preprint: arxiv.org/abs/2512.12775

2 months ago 3 2 1 1

In our new paper, "A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages", we go beyond final-answer accuracy to analyze multilingual reasoning along three dimensions: performance, consistency, and faithfulness.

4 weeks ago 5 1 1 1
Post image

โœจNew paperโœจ

We find script (e.g. Cyrillic, Latin) to be a linear direction in the activation space of Whisper, enabling transliteration at test-time by adding such script directions to the activations โ€” producing e.g. Cyrillic Japanese transcriptions.

3 months ago 10 5 1 0
VarDial @ EACL 2026, with important dates (see next post for text version). 
Photo CC-0.

VarDial @ EACL 2026, with important dates (see next post for text version). Photo CC-0.

VarDial 2026 will be colocated with @eaclmeeting.bsky.social! We're looking forward to your papers on NLP for similar languages, varieties and dialects :)

Deadline: Dec 19 (Jan 2 for pre-reviewed ARR papers)
sites.google.com/view/vardial...

6 months ago 14 10 1 1
Post image

Group photo at NeurIPS 2025 San Diego

4 months ago 9 1 0 0
Advertisement

Congrats to Pingjun, @beiduo.bsky.social , Siyao, Marie, and @barbaraplank.bsky.social for receiving the SAC Highlights reward!

5 months ago 5 1 0 0

Congrats to our team member Diego Frassinelli on the SAC Highlights award!

5 months ago 1 0 0 0

Awesome! We're also creating one currently and have included yours as a starter :)

8 months ago 2 0 1 0
Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken

Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken

At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.

8 months ago 16 4 1 1

UPDATE: Our poster presentation got moved to Tuesday, 16:00โ€“17:30 (session 10)! #ACL2025NLP

8 months ago 3 1 0 0

Unsure which presentations to attend at #ACL2025? ๐Ÿ›Ž๏ธ๐Ÿ—ฃ๏ธ

8 months ago 4 2 0 0

๐Ÿ‘ฅโ€ช @boleima.bsky.social Yuting Li, Wei Zhou, Ziwei Gong, @janetlauyeung.bsky.social Katja Jasinskaja @annefriedrich.bsky.social Julia Hirschberg, Frauke Kreuter @barbaraplank.bsky.social

8 months ago 0 0 0 0

๐Ÿ‘ฅโ€ช @boleima.bsky.social Berk Yoztyurk @carohaensch.bsky.social @xinpeng.bsky.social Markus Herklotz, Frauke Kreuter @barbaraplank.bsky.social @assenmacher.bsky.social

8 months ago 1 0 0 0

๐Ÿ“ Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter
๐Ÿ”Ž 263 languages, 10 similarity measures, 3 NLP tasks
๐Ÿ‘ฅ @verenablaschke.bsky.socialย Masha Fedzechkina @maartjeterhoeve.bsky.social
๐Ÿ”— arxiv.org/abs/2501.14491
๐Ÿ“ Findings โ€“ long

8 months ago 0 0 0 0

๐Ÿ“Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?
๐Ÿ”ŽAnalyzing how human-like LLMs are when taking reading, history, and economics tests
๐Ÿ‘ฅ @saeub.bsky.social , Diego Frassinelli, @barbaraplank.bsky.social
๐Ÿ”— arxiv.org/abs/2506.09796
๐Ÿ“BEA workshop - Long

8 months ago 2 1 1 0
Advertisement

๐Ÿ“ GerMedIQ: A Resource for Simulated and Synthesized Anamnesis Interview Responses in German
๐Ÿ”Ž We release a novel German anamnesis question-response dataset with human-simulated and LLM-augmented responses.
๐Ÿ‘ฅ @JHofenbitzer et al.
๐Ÿ”— github.com/Jhofenbitzer...
๐Ÿ“SRW - Long

8 months ago 0 0 1 0

๐Ÿ“Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set
๐Ÿ”ŽDo LLMs encode and generalize discourse knowledge across languages?
๐Ÿ‘ฅ @florian-eichin.com @janetlauyeung.bsky.social @mhedderich.bsky.social @barbaraplank.bsky.social
๐Ÿ”— arxiv.org/abs/2503.10515
๐Ÿ“Main - Long

8 months ago 3 1 1 1

๐Ÿ“LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
๐Ÿ”ŽWe present a large-scale study of whether LLM judgments can be reliably used as proxies for human judgments
๐Ÿ‘ฅAnna Bavaresco et al.
๐Ÿ”— arxiv.org/abs/2406.18403
๐Ÿ“Main - Short

8 months ago 0 0 1 0

๐Ÿ“ What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
๐Ÿ‘ฅ @mhedderich.bsky.social Anyi Wang @raoyuan.bsky.social @florian-eichin.com Jonas Fischer @barbaraplank.bsky.social โ€จ
๐Ÿ”— arxiv.org/abs/2504.158...โ€จ
๐Ÿ“Main - Long

8 months ago 2 1 1 0

๐Ÿ“A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI
๐Ÿ‘ฅ @beiduo.bsky.social Siyao Peng @annakorhonen.bsky.social @barbaraplank.bsky.social
๐Ÿ”— arxiv.org/abs/2412.13942
๐Ÿ“ACL25 Findings-Long

8 months ago 0 0 1 0

๐Ÿ“Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
๐Ÿ”ŽWe study the relationship between circuits for highly compositional and functionally related tasks
๐Ÿ‘ฅ@pmondorf.bsky.social Sondre Wold @barbaraplank.bsky.social
๐Ÿ”— arxiv.org/abs/2410.01434
๐Ÿ“Main-Long

8 months ago 0 0 1 0

๐Ÿ“Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges
๐Ÿ”ŽWe review existing datasets for evaluating LLMsโ€™ pragmatic capabilities, outlining key challenges and promising future directions
๐Ÿ”— arxiv.org/abs/2502.12378
๐Ÿ“Main - Long

8 months ago 0 0 2 0

๐Ÿ“Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study
๐Ÿ”ŽThis study evaluates LLMs in generating German public opinions using open-ended survey data
๐Ÿ”— arxiv.org/abs/2412.13169
๐Ÿ“Main - Long

8 months ago 0 0 2 0
Post image

Headed to ACL? MaiNLP & our most recent work will be there too๐Ÿ‘ฅ๐Ÿ“„
Come see what weโ€™ve been working on!

8 months ago 14 5 1 2
Advertisement
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions through subnetworks that can be composed to perform mo...

๐Ÿ“„ย [ACL 2025 main] Circuit compositions: Exploring Modular Structures in Transformer-Based Language Models (doi.org/10.48550/arX...)

9 months ago 5 2 1 0
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks There is an increasing trend towards evaluating NLP models with LLMs instead of human judgments, raising questions about the validity of these evaluations, as well as their reproducibility in the case...

๐Ÿ“„ย [ACL 2025 main] LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks (doi.org/10.48550/arX...)

9 months ago 10 4 1 0
Correlations between transfer results per experiment (parsing, POS tagging, topic classification with different input representations) and similarity measures. The results vary a lot across experiments and measures โ€“ some are described in the next posts.

Correlations between transfer results per experiment (parsing, POS tagging, topic classification with different input representations) and similarity measures. The results vary a lot across experiments and measures โ€“ some are described in the next posts.

At #ACL2025NLP I'll present our analysis of the effect of linguistic similarity on cross-lingual transfer! We looked at how 10 similarity measures correlate w/ transfer results btwn 263 languages across 3 NLP tasks. Different similarity measures matter for diff. experiments (no one-size-fits-all)!

9 months ago 21 1 1 1
Post image

๐Ÿค” Can LLMs read between the lines?

Our another #ACL2025 paper surveys resources on how LLMs handle pragmatics like implicatures, deixis, and more. We map out a new landscape for both LLMs and linguistics in pragmatic research.

๐Ÿ“„ arxiv.org/abs/2502.12378
๐Ÿง ๐Ÿ’ฌ #LLMs #Pragmatics

9 months ago 16 4 1 1