—> it’s paradigm replacement, not task automation, that causes widespread job displacement
from substack.com/home/post/p-...
Posts by Raphaël Merx
cool analogy for AI & jobs:
- when ATMs came, the number of bank tellers rose, bc ATMs lowered the cost of running bank branches, so more branches opened
- but in the 2010s, the number of bank tellers plummetted, bc mobile banking made branches unnecessary
a *great* chart to teach confounding variables
This is some legit really impressive work!!
Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.
the paper www2.statmt.org/wmt25/pdf/20...
They say it's because (1) test sets have become more challenging, (2) include more lang pairs, (3) are longer, and (4) used ESA instead of MQM. But we need an ablation study!
Whoa the #WMT25 results on MT Evaluation are wild! ChrF outperforms pretty much all neural metrics 🙀
kudos to whoever came up with that paper name 👌
in Vienna for ACL, presenting Tulun, a system for low-resource in-domain translation, using LLMs
Tuesday @ 4pm
Working w 2 real use cases: medical translation into Tetun 🇹🇱 & disaster relief speech translation in Bislama 🇻🇺
Cool paper, at the intersection of grammar and LLM interpretability.
I like that they use linguistic datasets for their experiments, then get results that can contribute to linguistics as a field too! (on structural priming vs L1/L2)
Thanks a lot! I didn't make it to Albuquerque unfortunately, but I hope to be in Vienna for ACL. Might see you there?
Many thanks to Adérito Correia (Timor-Leste INL), and my supervisors Hanna Suominen Katerina Vylomova!
Paper at aclanthology.org/2025.loresmt... , video presentation at youtu.be/8zenieJWRyg
(3) The vast majority of usage is on mobile (over 90% of users / over 80k devices)
Takeaway: publishing MT model in mobile apps is probably more impactful than setting up a website / HuggingFace space.
(2) Translation into Tetun is in higher demand (by >2x) than translation from Tetun
Takeaway for us MT folks: focus on translation into low-res langs, harder but more impactful
We find that
(1) a LOT of usage is for educational purposes (>50% of translated text)
--> contrasts sharply with Tetun corpora (e.g. MADLAD), dominated by news & religion.
Takeaway: don't evaluate MT on overrepresented domains (e.g. religion)! You risk misrepresenting end-user exp.
Our paper on who uses tetun.org, and what for, got published at the LoResMT 2025 workshop! An emotional paper for me, going back to the project that got me into a machine learning PhD in the first place.
Very interesting findings, particularly the benefit (or lack thereof) of test-time scaling across domains
My favourite ICLR paper so far. Methodology, findings and their implications are all very cool.
In particular Fig. 2 + this discussion point:
Incredible paper, finding that large companies can game the LMArena through statistical noise (via many model submissions), over-sampling of their models, and overfitting to Arena-style prompts (without real gains on model reasoning)
The experiments they run to show this are pretty cool too!
Cool summary of issues with multilingual LLM eval, and potential solutions!
If you're doubtful of all these non-reproducible evals on translated multiple choice questions, this paper is for you
GlotEval - a unified framework for multilingual eval of LLMs, on 7 different tasks, by @tiedeman.bsky.social @helsinki-nlp.bsky.social
Just wish it supported eval of closed models (e.g. through LiteLLM?)
github.com/MaLA-LM/Glot...
PyConAU We are on BlueSky! Follow us and stay tuned! @pyconau.bsky.social
👋 Hey Bluesky!
We’ve just touched down and we’re excited to be here 🌤️🐍
This is the official PyCon AU account, your go-to space for updates, announcements, and all things Python in Australia✨
Hit that follow button and stay tuned because we’ve got some awesome things coming your way!
#PyConAU
AI dev tools. In particular agents: are they hype or useful or both?
Perceptricon
The right thing to do, thanks for this *SEM
Super impactful, thank you for this! A natural sequel of Gatitos.
I'm esp. fond of your "researcher in the loop" method to ensure wide vocab coverage.
😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301
Huggingface: huggingface.co/datasets/goo...
Been hearing a lot about recency bias lately. Must be pretty important