MaiNLP talks/posters/events at EACL
MaiNLP is happy to be part of @eaclmeeting.bsky.social with several papers, talks, a panel, and a workshop โ๏ธ Looking forward to seeing you in Rabat! #EACL2026
MaiNLP talks/posters/events at EACL
MaiNLP is happy to be part of @eaclmeeting.bsky.social with several papers, talks, a panel, and a workshop โ๏ธ Looking forward to seeing you in Rabat! #EACL2026
We are honoured to welcome Prof Barbara Plank (@barbaraplank.bsky.social ) from @mainlp.bsky.social @cislmu.bsky.social, as our keynote speaker.
LoResLM @eaclmeeting.bsky.social
๐ข New paper accepted at @eaclmeeting.bsky.social
2026:
Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
with
@mhedderich.bsky.social
@amodarressi.bsky.social
Hinrich Schuetze
& Benjamin Roth.
Preprint: arxiv.org/abs/2512.12775
In our new paper, "A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages", we go beyond final-answer accuracy to analyze multilingual reasoning along three dimensions: performance, consistency, and faithfulness.
โจNew paperโจ
We find script (e.g. Cyrillic, Latin) to be a linear direction in the activation space of Whisper, enabling transliteration at test-time by adding such script directions to the activations โ producing e.g. Cyrillic Japanese transcriptions.
VarDial @ EACL 2026, with important dates (see next post for text version). Photo CC-0.
VarDial 2026 will be colocated with @eaclmeeting.bsky.social! We're looking forward to your papers on NLP for similar languages, varieties and dialects :)
Deadline: Dec 19 (Jan 2 for pre-reviewed ARR papers)
sites.google.com/view/vardial...
Group photo at NeurIPS 2025 San Diego
Congrats to Pingjun, @beiduo.bsky.social , Siyao, Marie, and @barbaraplank.bsky.social for receiving the SAC Highlights reward!
Congrats to our team member Diego Frassinelli on the SAC Highlights award!
Awesome! We're also creating one currently and have included yours as a starter :)
Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken
At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.
UPDATE: Our poster presentation got moved to Tuesday, 16:00โ17:30 (session 10)! #ACL2025NLP
Unsure which presentations to attend at #ACL2025? ๐๏ธ๐ฃ๏ธ
๐ฅโช @boleima.bsky.social Yuting Li, Wei Zhou, Ziwei Gong, @janetlauyeung.bsky.social Katja Jasinskaja @annefriedrich.bsky.social Julia Hirschberg, Frauke Kreuter @barbaraplank.bsky.social
๐ฅโช @boleima.bsky.social Berk Yoztyurk @carohaensch.bsky.social @xinpeng.bsky.social Markus Herklotz, Frauke Kreuter @barbaraplank.bsky.social @assenmacher.bsky.social
๐ Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter
๐ 263 languages, 10 similarity measures, 3 NLP tasks
๐ฅ @verenablaschke.bsky.socialย Masha Fedzechkina @maartjeterhoeve.bsky.social
๐ arxiv.org/abs/2501.14491
๐ Findings โ long
๐Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?
๐Analyzing how human-like LLMs are when taking reading, history, and economics tests
๐ฅ @saeub.bsky.social , Diego Frassinelli, @barbaraplank.bsky.social
๐ arxiv.org/abs/2506.09796
๐BEA workshop - Long
๐ GerMedIQ: A Resource for Simulated and Synthesized Anamnesis Interview Responses in German
๐ We release a novel German anamnesis question-response dataset with human-simulated and LLM-augmented responses.
๐ฅ @JHofenbitzer et al.
๐ github.com/Jhofenbitzer...
๐SRW - Long
๐Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set
๐Do LLMs encode and generalize discourse knowledge across languages?
๐ฅ @florian-eichin.com @janetlauyeung.bsky.social @mhedderich.bsky.social @barbaraplank.bsky.social
๐ arxiv.org/abs/2503.10515
๐Main - Long
๐LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
๐We present a large-scale study of whether LLM judgments can be reliably used as proxies for human judgments
๐ฅAnna Bavaresco et al.
๐ arxiv.org/abs/2406.18403
๐Main - Short
๐ What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
๐ฅ @mhedderich.bsky.social Anyi Wang @raoyuan.bsky.social @florian-eichin.com Jonas Fischer @barbaraplank.bsky.social โจ
๐ arxiv.org/abs/2504.158...โจ
๐Main - Long
๐A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI
๐ฅ @beiduo.bsky.social Siyao Peng @annakorhonen.bsky.social @barbaraplank.bsky.social
๐ arxiv.org/abs/2412.13942
๐ACL25 Findings-Long
๐Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
๐We study the relationship between circuits for highly compositional and functionally related tasks
๐ฅ@pmondorf.bsky.social Sondre Wold @barbaraplank.bsky.social
๐ arxiv.org/abs/2410.01434
๐Main-Long
๐Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges
๐We review existing datasets for evaluating LLMsโ pragmatic capabilities, outlining key challenges and promising future directions
๐ arxiv.org/abs/2502.12378
๐Main - Long
๐Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study
๐This study evaluates LLMs in generating German public opinions using open-ended survey data
๐ arxiv.org/abs/2412.13169
๐Main - Long
Headed to ACL? MaiNLP & our most recent work will be there too๐ฅ๐
Come see what weโve been working on!
๐ย [ACL 2025 main] Circuit compositions: Exploring Modular Structures in Transformer-Based Language Models (doi.org/10.48550/arX...)
๐ย [ACL 2025 main] LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks (doi.org/10.48550/arX...)
Correlations between transfer results per experiment (parsing, POS tagging, topic classification with different input representations) and similarity measures. The results vary a lot across experiments and measures โ some are described in the next posts.
At #ACL2025NLP I'll present our analysis of the effect of linguistic similarity on cross-lingual transfer! We looked at how 10 similarity measures correlate w/ transfer results btwn 263 languages across 3 NLP tasks. Different similarity measures matter for diff. experiments (no one-size-fits-all)!
๐ค Can LLMs read between the lines?
Our another #ACL2025 paper surveys resources on how LLMs handle pragmatics like implicatures, deixis, and more. We map out a new landscape for both LLMs and linguistics in pragmatic research.
๐ arxiv.org/abs/2502.12378
๐ง ๐ฌ #LLMs #Pragmatics