Raphaël Merx (@rapha.dev) Bsky

Why ATMs didn’t kill bank teller jobs, but the iPhone did There's a lot more to replacing labor than just automating tasks

—> it’s paradigm replacement, not task automation, that causes widespread job displacement

from substack.com/home/post/p-...

3 weeks ago 0 0 0 0

cool analogy for AI & jobs:

- when ATMs came, the number of bank tellers rose, bc ATMs lowered the cost of running bank branches, so more branches opened

- but in the 2010s, the number of bank tellers plummetted, bc mobile banking made branches unnecessary

3 weeks ago 0 0 1 0

a *great* chart to teach confounding variables

3 weeks ago 0 0 0 0

This is some legit really impressive work!!

5 months ago 3 1 0 0

Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.

5 months ago 22 10 1 5

the paper www2.statmt.org/wmt25/pdf/20...

5 months ago 0 0 0 0

They say it's because (1) test sets have become more challenging, (2) include more lang pairs, (3) are longer, and (4) used ESA instead of MQM. But we need an ablation study!

5 months ago 0 0 1 0

Whoa the #WMT25 results on MT Evaluation are wild! ChrF outperforms pretty much all neural metrics 🙀

5 months ago 0 0 1 0

kudos to whoever came up with that paper name 👌

6 months ago 1 0 0 0

Tulun: Transparent and Adaptable Low-resource Machine Translation Raphael Merx, Hanna Suominen, Lois Yinghui Hong, Nick Thieberger, Trevor Cohn, Ekaterina Vylomova. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: Sy...

paper: aclanthology.org/2025.acl-dem...
demo: youtu.be/fQFwOxzR4MI

8 months ago 0 0 0 0

in Vienna for ACL, presenting Tulun, a system for low-resource in-domain translation, using LLMs
Tuesday @ 4pm

Working w 2 real use cases: medical translation into Tetun 🇹🇱 & disaster relief speech translation in Bislama 🇻🇺

8 months ago 3 1 1 0

Cool paper, at the intersection of grammar and LLM interpretability.

I like that they use linguistic datasets for their experiments, then get results that can contribute to linguistics as a field too! (on structural priming vs L1/L2)

9 months ago 1 0 0 0

Thanks a lot! I didn't make it to Albuquerque unfortunately, but I hope to be in Vienna for ACL. Might see you there?

10 months ago 0 0 1 0

Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service Raphael Merx, Adérito José Guterres Correia, Hanna Suominen, Ekaterina Vylomova. Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025). 20...

Many thanks to Adérito Correia (Timor-Leste INL), and my supervisors Hanna Suominen Katerina Vylomova!

Paper at aclanthology.org/2025.loresmt... , video presentation at youtu.be/8zenieJWRyg

10 months ago 1 0 0 0

(3) The vast majority of usage is on mobile (over 90% of users / over 80k devices)

Takeaway: publishing MT model in mobile apps is probably more impactful than setting up a website / HuggingFace space.

10 months ago 2 0 1 0

(2) Translation into Tetun is in higher demand (by >2x) than translation from Tetun

Takeaway for us MT folks: focus on translation into low-res langs, harder but more impactful

10 months ago 0 0 1 0

We find that
(1) a LOT of usage is for educational purposes (>50% of translated text)
--> contrasts sharply with Tetun corpora (e.g. MADLAD), dominated by news & religion.

Takeaway: don't evaluate MT on overrepresented domains (e.g. religion)! You risk misrepresenting end-user exp.

10 months ago 0 0 1 0

Our paper on who uses tetun.org, and what for, got published at the LoResMT 2025 workshop! An emotional paper for me, going back to the project that got me into a machine learning PhD in the first place.

10 months ago 3 0 2 0

Very interesting findings, particularly the benefit (or lack thereof) of test-time scaling across domains

10 months ago 0 0 0 0

My favourite ICLR paper so far. Methodology, findings and their implications are all very cool.

In particular Fig. 2 + this discussion point:

10 months ago 3 1 0 0

Incredible paper, finding that large companies can game the LMArena through statistical noise (via many model submissions), over-sampling of their models, and overfitting to Arena-style prompts (without real gains on model reasoning)

The experiments they run to show this are pretty cool too!

11 months ago 4 0 0 0

Cool summary of issues with multilingual LLM eval, and potential solutions!

If you're doubtful of all these non-reproducible evals on translated multiple choice questions, this paper is for you

11 months ago 2 0 1 0

GitHub - MaLA-LM/GlotEval: GlotEval: a unified evaluation toolkit designed to benchmark Large Language Models (LLMs) in a language-specific way GlotEval: a unified evaluation toolkit designed to benchmark Large Language Models (LLMs) in a language-specific way - MaLA-LM/GlotEval

GlotEval - a unified framework for multilingual eval of LLMs, on 7 different tasks, by @tiedeman.bsky.social @helsinki-nlp.bsky.social

Just wish it supported eval of closed models (e.g. through LiteLLM?)

github.com/MaLA-LM/Glot...

11 months ago 1 0 0 0

PyConAU We are on BlueSky! Follow us and stay tuned! @pyconau.bsky.social

👋 Hey Bluesky!

We’ve just touched down and we’re excited to be here 🌤️🐍

This is the official PyCon AU account, your go-to space for updates, announcements, and all things Python in Australia✨

Hit that follow button and stay tuned because we’ve got some awesome things coming your way!

#PyConAU

1 year ago 6 7 0 0

AI dev tools. In particular agents: are they hype or useful or both?

1 year ago 0 0 0 0

Perceptricon

1 year ago 1 0 0 0

The right thing to do, thanks for this *SEM

1 year ago 2 0 0 0

Super impactful, thank you for this! A natural sequel of Gatitos.

I'm esp. fond of your "researcher in the loop" method to ensure wide vocab coverage.

1 year ago 1 0 0 0

😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301
Huggingface: huggingface.co/datasets/goo...

1 year ago 14 8 2 1

Been hearing a lot about recency bias lately. Must be pretty important

1 year ago 119 27 0 0

Posts by Raphaël Merx