Advertisement · 728 × 90

Posts by Computational Linguistics @ UZH

🎤 Dr. Valentina Pyatkin is a postdoctoral researcher at the Allen Institute for AI, contributing to OLMo and working on post-training for LLMs, and part-time affiliated with the ETH AI Center as part of the Swiss AI Initiative. @valentinapy.bsky.social

3/3

4 days ago 1 0 0 0

🎤 Prof. Dr. Alexandra Birch is Professor and Chair of Multilingual NLP at the University of Edinburgh and co-founder & Chief Scientist of Aveni.ai. Her research spans multilingual and multimodal NLP, ethics, and explainability, and she has led major EU projects including EuroLLM and GoURMET.

2/3

4 days ago 1 0 1 0
Preview
SwissText 2026 10. Juni 2026 | UZH Campus Oerlikon

🗓️ SwissText 2026 keynote speakers announced & registration open!

We are delighted to welcome Prof. Dr. Alexandra Birch and Dr. Valentina Pyatkin as our keynote speakers.

📋 Register here: ema.uzh.ch/RHK4W
Early-bird rates available throughout April, with additional student discounts.
#NLProc

1/3

4 days ago 3 2 1 0
Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Bias To be discoverable in an embedding-based search process, each part of a document should be reflected in its embedding representation. To quantify any potential reflection biases, we introduce a permutation-based evaluation framework. With this, we observe that state-of-the-art embedding models exhibit systematic positional and language biases when documents are longer and consist of multiple segments. Specifically, early segments and segments in higher-resource languages like English are over-represented, while later segments and segments in lower-resource languages are marginalized. In our further analysis, we find that the positional bias stems from front-loaded attention distributions in pooling-token embeddings, where early tokens receive more attention. To mitigate this issue, we introduce an inference-time attention calibration method that redistributes attention more evenly across document positions, increasing discoverabiltiy of later segments. Our evaluation framework and attention calibration is available at https://github.com/impresso/fair-sentence-transformers

🔵 (Findings:) Elias Schuhmacher, Andrianos Michail, @nlopitz.bsky.social, @ricosennrich.bsky.social, @simon-clematide.bsky.social. Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Bias. arxiv.org/abs/2601.16934

7/7

1 week ago 0 0 0 0

🔵 Apertus: Democratizing Open and Compliant LLMs for Global Language Environments. arxiv.org/abs/2509.14233

6/7

1 week ago 1 0 1 0
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization Tokenization is the first -- and often least scrutinized -- step of most NLP pipelines. Standard algorithms for learning tokenizers rely on frequency-based objectives, which favor languages dominant in the training data and consequently leave lower-resource languages with tokenizations that are disproportionately longer, morphologically implausible, or even riddled with <UNK> placeholders. This phenomenon ultimately amplifies computational and financial inequalities between users from different language backgrounds. To remedy this, we introduce Parity-aware Byte Pair Encoding (BPE), a variant of the widely-used BPE algorithm. At every merge step, Parity-aware BPE maximizes the compression gain of the currently worst-compressed language, trading a small amount of global compression for cross-lingual parity. We find empirically that Parity-aware BPE leads to more equitable token counts across languages, with negligible impact on global compression rate and no substantial effect on language-model performance in downstream tasks.

🔵 @negarforoutan.bsky.social, Clara Meister, Debjit Paul, @joelniklaus.bsky.social, @sinaahmadi.bsky.social, Antoine Bosselut, @ricosennrich.bsky.social . Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization. arxiv.org/abs/2508.04796

5/7

1 week ago 1 1 1 0

🔵 Kevin Du, Clara Kümpel, @michellewastl.bsky.social, Alex Warstadt. It’s Not What You Say, It’s How You Say It: Evaluating LLM Responses to Expressions of Belief.

4/7

1 week ago 0 0 1 0

🔵 @zifanjiang.bsky.social, Youngjoon Jang, Liliane Momeni, Gül Varol, Sarah Ebling, Andrew Zisserman. Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing. arxiv.org/abs/2512.08094

3/7

1 week ago 0 0 1 0
Advertisement
SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents Recognizing semantic differences across documents, especially in different languages, is crucial for text generation evaluation and multilingual content alignment. However, as a standalone task it has received little attention. We address this by introducing SwissGov-RSD, the first naturalistic, document-level, cross-lingual dataset for semantic difference recognition. It encompasses a total of 224 multi-parallel documents in English-German, English-French, and English-Italian with token-level difference annotations by human annotators. We evaluate a variety of open-source and closed source large language models as well as encoder models across different fine-tuning settings on this new benchmark. Our results show that current automatic approaches perform poorly compared to their performance on monolingual, sentence-level, and synthetic benchmarks, revealing a considerable gap for both LLMs and encoder models. We make our code and datasets publicly available.

🔵 @michellewastl.bsky.social, @vamvas.bsky.social, @ricosennrich.bsky.social. SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents. arxiv.org/abs/2512.07538

2/7

1 week ago 0 0 1 0
🔵  Michelle Wastl, Jannis Vamvas, Rico Sennrich. SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents.

🔵 Zifan Jiang, Youngjoon Jang, Liliane Momeni, Gül Varol, Sarah Ebling, Andrew Zisserman. Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing.

🔵 Kevin Du, Clara Kümpel, Michelle Wastl, Alex Warstadt. It’s Not What You Say, It’s How You Say It: Evaluating LLM Responses to Expressions of Belief.

🔵 Negar Foroutan, Clara Meister, Debjit Paul, Joel Niklaus, Sina Ahmadi, Antoine Bosselut, Rico Sennrich. Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization.

🔵 Apertus: Democratizing Open and Compliant LLMs for Global Language Environments. https://lnkd.in/eFTh-5m7

🔵 (Findings:) Elias Schuhmacher, Andrianos Michail, Juri Opitz, Rico Sennrich, Simon Clematide. Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Bias.

🔵 Michelle Wastl, Jannis Vamvas, Rico Sennrich. SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents. 🔵 Zifan Jiang, Youngjoon Jang, Liliane Momeni, Gül Varol, Sarah Ebling, Andrew Zisserman. Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing. 🔵 Kevin Du, Clara Kümpel, Michelle Wastl, Alex Warstadt. It’s Not What You Say, It’s How You Say It: Evaluating LLM Responses to Expressions of Belief. 🔵 Negar Foroutan, Clara Meister, Debjit Paul, Joel Niklaus, Sina Ahmadi, Antoine Bosselut, Rico Sennrich. Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization. 🔵 Apertus: Democratizing Open and Compliant LLMs for Global Language Environments. https://lnkd.in/eFTh-5m7 🔵 (Findings:) Elias Schuhmacher, Andrianos Michail, Juri Opitz, Rico Sennrich, Simon Clematide. Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Bias.

Looking forward to the ACL 2026 conference in San Diego, California!
Several accepted papers have involvement from our department:

1/7

1 week ago 5 3 1 0
Preview
UZH: PhD in Language AI / Natural Language Processing You will be joining the Department of Computational Linguistics, which has 6 Research Groups and around 70 postdoctoral and student researchers in the areas of Text Technologies, Phonetics and Speech ...

My lab is recruiting one PhD student and one post-doctoral researcher for a start as soon as this Summer! Apply by April 1 / March 31 to be among first candidates considered.

jobs.uzh.ch/job-vacancie...

jobs.uzh.ch/job-vacancie...

4 weeks ago 4 6 0 0

📣 SwissText 2026 submission deadline has been extended until 24 March 2026 (AoE)!

Submit your work here: openreview.net/group?id=Swi...

1 month ago 2 2 0 0

Available tracks:
🔹 Scientific Track (archival research)
🔹 Corpus Track (focus on datasets)
🔹 Applied Track (real-world NLP applications)
🔹 Demonstration Track (live system demos)

Less than 3 weeks left – don't miss your chance to present in Zurich!

(3/3)

1 month ago 1 0 0 0
SwissText 2026 Conference Welcome to the OpenReview homepage for SwissText 2026 Conference

Still working on that #NLProc paper? Time to submit! We welcome work on foundational research, #Swiss-centric language resources, and real-world NLP applications submitted via #OpenReview:

openreview.net/group?id=Swi...

(2/3)

1 month ago 1 1 1 0
2026 Call for Papers | SwissText

📣 SwissText 2026 – 2nd Call for Papers!

🎯 Special theme: #ReproducibleNLP
📅 Deadline approaching: 17 March 2026
📍 Zurich, Switzerland, 10 June 2026

www.swisstext.org/call-for-pap...

(1/3)

1 month ago 5 2 1 1
Rico Sennrich takes over dual Professorship

Congratulations to @ricosennrich.bsky.social on his dual Professorship at our department and at the Department of Informatics.
www.ifi.uzh.ch/en/news/rico...

2 months ago 1 0 0 0
SwissText 2026 Conference Welcome to the OpenReview homepage for SwissText 2026 Conference

All submissions are handled via #OpenReview

openreview.net/group?id=Swi...

(4/4)

2 months ago 1 0 0 0

We welcome submissions across multiple tracks, including:

🔹 Scientific Track (archival research)
🔹 Corpus Track (focus on datasets)
🔹 Applied Track (real-world NLP applications)
🔹 Demonstration Track (live system demos)

(3/4)

2 months ago 1 1 1 0
Advertisement

Whether you’re advancing foundational #NLProc research, releasing new Swiss-centric #language resources, or showcasing cutting-edge NLP applications, we’d love to see your work!

(2/4)

2 months ago 1 0 1 0
2026 Call for Papers | SwissText

📣 SwissText 2026 – Call for Papers is open!

🎯 Special theme: #ReproducibleNLP
📅 Submission deadline: 17 March 2026
📍 Zurich, Switzerland · 10 June 2026

www.swisstext.org/call-for-pap...

(1/4)

2 months ago 6 3 1 0
2026 | SwissText

📣 We’re pleased to announce that our department will co-organize the 11th edition of the SwissText conference!
#SwissText2026 will take place on 10 June 2026, hosted at the University of Zurich (Campus Oerlikon). Save the date!

www.swisstext.org/current/

#NLProc #UZHai #CompLing

2 months ago 5 4 0 0
UZH Postdoc Team Award: When Spirituality Meets AI
Group photo of the winning team consisting of Patrick Montjouridès,  Anastassia Shaitarova, Fabian Winiger and Yingqiang Gao.

UZH Postdoc Team Award: When Spirituality Meets AI Group photo of the winning team consisting of Patrick Montjouridès, Anastassia Shaitarova, Fabian Winiger and Yingqiang Gao.

Congratulations to @shaitarova.bsky.social and Yingqiang Gao from our department, and to their collegues Fabian Winiger (Department of Theology) and Patrick Montjouridès (Institute of Education) for winning the UZH Postdoc Team Award!
www.news.uzh.ch/en/articles/...

2 months ago 3 0 0 0
Best Talk Award CPL 2025

Best Talk Award CPL 2025

Congratulations to our PhD student @cuiding.bsky.social for winning the Best Talk Award in the Computational Psycholinguistics Meeting 2025 (CPL 2025) lnkd.in/euTX89gi!

4 months ago 5 0 0 1
Post image

Take a look at how we challenge state-of-the-art NLP systems to recognize token-level semantic differences across languages with our new SwissGov-RSD dataset! @vamvas.bsky.social @ricosennrich.bsky.social

Paper: arxiv.org/pdf/2512.075...
Dataset: huggingface.co/datasets/Zur...

#NLProc

4 months ago 6 3 0 0
Iuliia Thorbecke wearing her Docotor's hat that was decorated by other PhD students of our department

Iuliia Thorbecke wearing her Docotor's hat that was decorated by other PhD students of our department

Congratulations to Iuliia Thorbecke (Nigmatulina), who just passed the viva of her PhD on using contextual information to improve automatic speech recognition, for example using radar flight information to better transcribe air traffic communication.

4 months ago 3 2 0 0
Advertisement
Post image

Congratulations to Ann-Sophie Gnehm, who just passed the viva of her PhD on "Bridging the Gap between Job Ad Texts and Labor Market Ontologies with Adaptation and Contextualization Techniques". With many thanks to the external examiners @dirkhovy.bsky.social and Einat Minkov.

5 months ago 5 0 0 0
Post image

UZH group picture at #EMNLP2025!

If you're here, catch us for a chat!

5 months ago 4 1 0 0

🎉 Terminology Shared Task @WMT25: Paper Out 🎉
Highlights:
- sentence translation seems solvable, document translation is still challenging
- better systems benefit more from proper terminologies
- term-based metrics correlate poorly with general translation quality

www2.statmt.org/wmt25/pdf/20...

5 months ago 2 2 0 0
Preview
What Does and What Should AI Know? Mittwoch, 12. November 2025 | DSI Event Room (SOC E-010)

Upcoming panel discussion on Nov 12 moderated by our @tevuko.bsky.social: What Does and What Should AI Know?

More info and registration link: www.ema.uzh.ch/de/register/...

5 months ago 0 0 0 0
Post image

📢 Announcing the First Workshop on Multilingual and Multicultural Evaluation (MME) at #EACL2026 🇲🇦

MME focuses on resources, metrics & methodologies for evaluating multilingual systems! multilingual-multicultural-evaluation.github.io

📅 Workshop Mar 24–29, 2026
🗓️ Submit by Dec 19, 2025

6 months ago 34 15 1 0