Miryam de Lhoneux (@mdlhx) Bsky

BPE-knockout just got outperformed by an algorithm that modifies BPE tokenisers in a feedback loop to make them absorb more and more constraints. It doesn't even need more data to do that. It uses the tokeniser itself as a dataset. 🧵

1 week ago 1 2 1 0

Right, still, this is insane

2 weeks ago 1 0 0 0

(If I were a reviewer for TX and received such a DR, I would immediately withdraw from the reviewer pool)

2 weeks ago 1 0 1 0

yikes, if X is what I think it is, TX is a sinking ship

2 weeks ago 1 0 1 0

GitHub - davidjurgens/hallucinated-reference-finder Contribute to davidjurgens/hallucinated-reference-finder development by creating an account on GitHub.

If you're reviewing ARR papers and want a tool to help you spot potential hallucinated references, I cooked this up for the ACL SACs and thought I would share it with the broader community github.com/davidjurgens...

3 weeks ago 24 10 3 1

I'm not there unfortunately but @wpoelman.bsky.social and Thomas Bauwens are

3 weeks ago 1 0 0 0

LAGoM will present several papers at #EACL2026 in Rabat next week! Our work at this year’s conference spans tokenisation, multilingual evaluation, and model design.

1 month ago 1 1 1 0

📢I'm organizing a BoF session at #EACL2026 called Tokenization & Beyond, aiming to gather researchers exploring tokenization and alternatives such as byte-level and pixel-based approaches. Sign up using the form if you're interested! #NLProc @eaclmeeting.bsky.social

1 month ago 10 9 1 1

Log in to EasyChair for EAMT2026

📣DEADLINE EXTENDED!📣

Need a few more days to perfect your paper for #EAMT2026? You got it.
We have pushed the submission deadline back!

🗓️ New Deadline: 25th March 2026 23:59 CEST

Breathe, revise, and submit: easychair.org/my/conferenc...

And don't forget to anonymize your paper 👀🕵️‍♀️

1 month ago 0 2 0 0

Calls for Papers - EAMT 2026 EAMT 2026 - European Association for Machine Translation Conference in Tilburg, Netherlands

⏳ ONE WEEK LEFT!
The #EAMT2026 submission deadline is closing next Wednesday (March 18).
Whether it's a deep-dive into LLM evaluations, low-resource MT, or a new user study, we want to see it. Let’s get those papers in! 🏃‍♂️💨
🔗 Submission info here: eamt2026.org/calls-for-pa...
👇

1 month ago 1 1 1 0

i'm doing it - i'm writing an MT proposal 👀

1 month ago 7 0 0 0

oof. I set a max load to 30 because in the last ARR cycle I had 60, and I had an AC for 15 submissions who did nothing at all (until 5 days past the meta-review ddl, they submitted meta-reviews for papers for which i already had an emergency AC who had written a meta-review)

1 month ago 0 0 0 0

good morning to you and to my ACs who submitted all their meta-reviews before the deadline

1 month ago 16 0 1 0

As someone with degrees in both, this is spot on 🎯

1 month ago 1 0 0 0

Aujourd'hui une étudiante bullet points m'a dit que pour un autre exam, le.a prof a dit justement qu'iel voulait des listes et pas des phrases complètes. Pour un exam open book. Je comprends pas. Mais donc les collègues empirent le truc apparemment

2 months ago 1 0 1 0

Hi #NLP Bluesky, the Multilinguality track at #ACL2026NLP @aclmeeting.bsky.social needs emergency reviewers. If you can complete one or two reviews before February 15, please reach out. Thank you!

2 months ago 0 6 0 0

I think something is particularly wrong this cycle, I've heard multiple stories of people being assigned more than their load even a case of someone whose load was supposed to be 0. Don't know if bug or intentional

2 months ago 2 0 1 0

thx, already saved this to my zotero and plan to read it soon! :) looks like very interesting work!

2 months ago 2 0 0 0

Je suis contente de pas être dans le système francophone juste pour ça

2 months ago 1 0 1 0

A photograph of sunny Copenhagen in the summer!

📢 I am hiring a highly-motivated Ph.D student at the University of Copenhagen to work on tokenization-free NLP.

Read our previous work in this topic: aclanthology.org/2025.emnlp-m...
aclanthology.org/2023.emnlp-m...
openreview.net/forum?id=FkS...

Apply by March 8: employment.ku.dk/phd/?show=1563

2 months ago 20 9 0 0

Form and Meaning in Intrinsic Multilingual Evaluations Intrinsic evaluation metrics for conditional language models, such as perplexity or bits-per-character, are widely used in both mono- and multilingual settings. These metrics are rather straightforwar...

New EACL paper (with @mdlhx.bsky.social)! We tested if comparing perplexity of parallel data across languages is fair. Turns out: it depends. We show the choice of test set (even with consistent meaning) can flip conclusions about which language is easier to model.

Paper: arxiv.org/abs/2601.10580

2 months ago 10 3 0 0

👀

2 months ago 1 0 1 0

Today, the ACL Anthology switched to a new system for how author pages work. From now on, ORCID iDs will be the main mechanism for matching papers to the correct author. 🧵⤵️

2 months ago 9 5 1 0

Reminder that we have an alt-ARR slack workspace where ACs and SACs can support each other through the sometimes confusing process of the ARR cycle! Post or DM me a good email address for a Slack invitation and I will add you. #ACL2026

3 months ago 5 2 0 0

J'étais pas sérieuse hehe. Je crois que c'est juste inévitable. Mes exams sont open-book. Je le dis répétitivement en cours, je le dis sur un pdf d'info sur l'exam, c'est écrit sur la page officielle du cours. J'ai encore des étudiants qui écrivent sur leur copie "I didn't know it was open book"

3 months ago 2 0 1 0

"Think about it step by step"

3 months ago 1 0 1 0

Je crois qu'il faut aussi écrire avant chaque question, lis les instructions d'abord. Peut-être même les faire réécrire les instructions avant de répondre aux questions

3 months ago 2 0 1 0

Master et master "avancé"

3 months ago 1 0 1 0

C'est ce que je fais et pourtant chaque année j'ai des étudiants qui écrivent quand-même des listes

3 months ago 1 0 1 0

Congratulations Dr.!!!! 🎉

3 months ago 1 0 0 0

Posts by Miryam de Lhoneux