Advertisement · 728 × 90

Posts by

Post image

The world’s largest NLP conference with almost 2,000 papers presented, ACL 2025 just took place in Vienna! 🎓✨ Here is a quick snapshot of the event via a short interview with one of the authors whose work caught my attention.
🎥 Watch: youtu.be/GBISWggsQOA

7 months ago 3 2 0 0

ACL paper: aclanthology.org/2023.acl-lon...
Models: github.com/Heidelberg-N...
Read more: cl.uni-heidelberg.de/nlpgroup/new...
Morphological Analysis Demo: huggingface.co/spaces/bowph...
Machine Translation Demo: huggingface.co/spaces/bowphs/
Best Thesis Award: www.gscl.org/en/activitie...

7 months ago 2 0 0 0

I am honored to receive the 2025 #GSCL Best Thesis Award at #KONVENS in Hildesheim for my Master’s thesis, which investigates multilinguality and develops language models for Ancient Greek and Latin. Thank you to my mentors and collaborators. I look forward to what comes next.

7 months ago 4 1 1 1

Looking at Bruegel's Tower of Babel in Vienna makes you wonder: How can multilingual language models overcome the language barriers? Find out tomorrow!
📍 Level 1 (ironic, right?), Room 1.15-1
🕐 2 PM
#ACL2025NLP

8 months ago 3 0 0 1

Read the full paper here: arxiv.org/pdf/2506.01629

Reach out if you have any questions or if you are attending ACL and want to say hi. 🙋

10 months ago 2 0 0 0
Table comparing text generations between early and late checkpoints for the concepts "earthquake" and "joy". Early checkpoint generations show language-specific text, while late checkpoint generations demonstrate a shift toward "language-agnostic" (= English) text.

Table comparing text generations between early and late checkpoints for the concepts "earthquake" and "joy". Early checkpoint generations show language-specific text, while late checkpoint generations demonstrate a shift toward "language-agnostic" (= English) text.

This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!

10 months ago 2 0 1 0
Average overlap proportion of expert neurons across layers and training checkpoints. Later checkpoints exhibit more shared neurons, particularly in the middle layers.

Average overlap proportion of expert neurons across layers and training checkpoints. Later checkpoints exhibit more shared neurons, particularly in the middle layers.

This is not a bug, it's a feature! These layers are repurposing the space to form cross-lingual abstractions.
We track this by examining how specific concepts (like "earthquake" or "joy") align across languages.

10 months ago 2 0 1 0

We ask a probing classifier: "Given this hidden state from layer l, what is the language of the source text?" The results are striking: earlier checkpoints consistently solve this with high accuracy across layers. Later checkpoints, however, exhibit clear performance drops.

10 months ago 2 0 1 0
Probing classifier performance comparison between early and late checkpoint across layers. While the early checkpoint shows uniformly high performance, the later checkpoint exhibits relatively high variance across layers.

Probing classifier performance comparison between early and late checkpoint across layers. While the early checkpoint shows uniformly high performance, the later checkpoint exhibits relatively high variance across layers.

How and when do multilingual LMs achieve cross-lingual generalization during pre-training? And why do later, supposedly more advanced checkpoints, lose some language identification abilities in the process? Our #ACL2025 paper investigates.

10 months ago 3 2 1 1
Advertisement

Read the full paper here: arxiv.org/pdf/2506.01629

Reach out if you have any questions or if you are attending ACL and want to say hi. 🙋

10 months ago 0 0 0 0
Table comparing text generations between early and late checkpoints for the concepts "earthquake" and "joy". Early checkpoint generations show language-specific text, while late checkpoint generations demonstrate a shift toward "language-agnostic" (= English) text.

Table comparing text generations between early and late checkpoints for the concepts "earthquake" and "joy". Early checkpoint generations show language-specific text, while late checkpoint generations demonstrate a shift toward "language-agnostic" (= English) text.

This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!

10 months ago 0 0 1 0
Average overlap proportion of expert neurons across layers and training checkpoints. Later checkpoints exhibit more shared neurons, particularly in the middle layers.

Average overlap proportion of expert neurons across layers and training checkpoints. Later checkpoints exhibit more shared neurons, particularly in the middle layers.

This is not a bug, it's a feature! These layers are repurposing the space to form cross-lingual abstractions.
We track this by examining how specific concepts (like "earthquake" or "joy") align across languages.

10 months ago 0 0 1 0

We ask a probing classifier: "Given this hidden state from layer l, what is the language of the source text?" The results are striking: earlier checkpoints consistently solve this with high accuracy across layers. Later checkpoints, however, exhibit clear performance drops.

10 months ago 0 0 1 0

Read the full paper here: arxiv.org/pdf/2506.01629

Reach out if you have any questions or if you are attending ACL and want to say hi. 🙋

10 months ago 0 0 0 0
Sample generations demonstrating language-specific generation in early checkpoint and language-agnostic (= English) generation in late checkpoint.

Sample generations demonstrating language-specific generation in early checkpoint and language-agnostic (= English) generation in late checkpoint.

This phenomenon has a visible effect on text generation: In BLOOM-560m, activating 'earthquake' neurons derived from Spanish data at checkpoint 10,000 generates Spanish text. At checkpoint 400,000, the same method yields English text!

10 months ago 0 0 1 0
Expert overlap proportion across layers for different training checkpoints.

Expert overlap proportion across layers for different training checkpoints.

This is not a bug, it's a feature! These layers are repurposing the space to form cross-lingual abstractions.
We track this by examining how specific concepts (like "earthquake" or "joy") align across languages.

10 months ago 0 0 1 0

We ask a probing classifier: "Given this hidden state from layer l, what is the language of the source text?" The results are striking: earlier checkpoints consistently solve this with high accuracy across layers. Later checkpoints, however, exhibit clear performance drops.

10 months ago 0 0 1 0
Advertisement
Post image

Debates aren’t always black and white—opposing sides often share common ground. These partial agreements are key for meaningful compromises
Presenting “Perspectivized Stance Vectors” (PSVs) — an interpretable method to identify nuanced (dis)agreements

📜 arxiv.org/abs/2502.09644
🧵 More details below

1 year ago 4 3 1 0
Preview
An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them Creston Brooks, Johannes Haubold, Charlie Cowen-Breen, Jay White, Desmond DeVaul, Frederick Riemenschneider, Karthik R Narasimhan, Barbara Graziosi. Findings of the Association for Computational Lingu...

Read the full paper: aclanthology.org/2025.finding...

Work by Creston Brooks, Johannes Haubold, Charlie Cowen-Breen, Jay White, Desmond DeVaul, me, Karthik Narasimhan, and Barbara Graziosi

11 months ago 7 1 0 0

Our work brings new computational methods to a field traditionally dominated by manual scholarship, potentially accelerating the discovery of textual errors that have remained hidden for centuries.

11 months ago 3 0 1 0

Perhaps most surprising: even powerful models like GPT-4 performed barely above random chance on this specialized task! This highlights the limitations of general-purpose LLMs when dealing with ancient text restoration.

11 months ago 3 0 1 0

We tested several error detection methods and found that our discriminator-based approach outperforms all others. Interestingly, scribal errors (the oldest type) are universally more difficult to detect than print or digitization errors across ALL methods.

11 months ago 2 0 1 0

Prior work has only evaluated error detection on artificially-generated errors. Our dataset contains REAL errors that naturally accumulated over centuries - the subtle mistakes that survived precisely because they often appear perfectly reasonable.

11 months ago 1 0 1 0

Creating this dataset was painstaking! Our domain expert spent over 100 hours reviewing potential errors, categorizing them as scribal errors (from manuscript copying), print errors (from creating editions), or digitization errors (from converting to digital).

11 months ago 2 0 1 0

In "An Annotated Dataset of Errors in Premodern Greek and Baselines for Detecting Them," we introduce the first expert-labeled dataset of real errors in ancient texts, enabling proper evaluation of error detection methods on authentic textual problems.

11 months ago 3 1 1 0

What did Aristotle actually write? We think we know, but reality is messy. As ancient Greek texts traveled through 2,500 years of history, they were copied and recopied countless times, accumulating subtle errors with each generation. Our new #NAACL2025 paper tackles this fascinating challenge.

11 months ago 13 4 1 2
Advertisement