I’m very happy to share that my latest paper on the 𝐚𝐜𝐪𝐮𝐢𝐬𝐢𝐭𝐢𝐨𝐧 𝐨𝐟 𝐯𝐞𝐫𝐛 𝐦𝐞𝐚𝐧𝐢𝐧𝐠, tested on models trained under CDL data vs ADL data , has been accepted for an 𝐨𝐫𝐚𝐥 𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 at the upcoming edition of 𝐂𝐨𝐠𝐒𝐜𝐢, which will take place at the end of July in Rio de Janeiro.💃
Posts by Jaap Jumelet
Our paper has been accepted to #EACL2026 main conference!
Together with @jumelet.bsky.social and @arianna-bis.bsky.social, we study the effect of target language typology on the difficulty of state-of-the-art neural machine translation.
arXiv preprint: arxiv.org/abs/2602.03551
1/6 Our findings ⬇️
👀 Look what 🎅 has broght just before Christmas 🎁: a brand new Research Master in Natural Language Processing at @facultyofartsug.bsky.social @rug.nl
Program: www.rug.nl/masters/natu...
Applications (2026/2027) are open! Come and study with us (you will also learn why we have a 🐮 in our logo)
🧑🔬I’m recruiting PhD students in Natural Language Processing @unileipzig.bsky.social Computer Science, together with @scadsai.bsky.social!
Topics include, but aren’t limited to:
🔎Linguistic Interpretability
🌍Multilingual Evaluation
📖Computational Typology
Please share!
#NLProc #NLP
📢Out now in NEJLT!📢
In each of these sentences, a verb that doesn't usually encode motion is being used to convey that an object is moving to a destination.
Given that these usages are rare, complex, and creative, we ask:
Do LLMs understand what's going on in them?
🧵1/7
Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 1: (a) Illustration of messages (left) and strings (right) in toy domain. Blue = grammatical strings. Red = ungrammatical strings. (b) Surprisal (negative log probability) assigned to toy strings by GPT-2."
New work to appear @ TACL!
Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.
Yet they often assign higher probability to ungrammatical strings than to grammatical strings.
How can both things be true? 🧵👇
I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!
Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)
arxiv.org/abs/2504.02768
accepted papers at main conference and findings
accepted papers at TACL and workshops
With only a week left for #EMNLP2025, we are happy to announce all the works we 🐮 will present 🥳 - come and say "hi" to our posters and presentations during the Main and the co-located events (*SEM and workshops) See you in Suzhou ✈️
For more information check out the website, paper, and datasets:
Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159
We hope BabyBabelLM will continue as a 'living resource', fostering both more efficient NLP methods, and opening ways for cross-lingual computational linguistics!
Next to our training resources, we also release an evaluation pipeline that assess different aspects of language learning.
We present results for various simple baseline models, but hope this can serve as a starting point for a multilingual BabyLM challenge in future years!
To deal with data imbalances, we divide languages into three Tiers. This better enables cross-lingual studies and makes it possible for low-resource languages to be a part of BabyBabelLM as well.
With a fantastic team of international collaborators we have developed a pipeline for creating LM training data from resources that children are exposed to.
We release this pipeline and welcome new contributions!
Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159
🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
Wij speelden als kind (in Breda) vaak "1 keer tets", waar je een voetbal maximaal 1 keer mocht laten stuiteren; ik had ook geen idee dat dat een Brabants woord was.
Happening now at the SIGTYP poster session! Come talk to Leonie and me about MultiBLiMP!
I'll be in Vienna only from tomorrow, but today my star PhD student Marianne is already presenting some of our work:
BLIMP-NL, in which we create a large new dataset for syntactic evaluation of Dutch LLMs, and learn a lot about dataset creation, LLM evaluation and grammatical abilities on the way.
Congrats and good luck in Canada!
Proud to introduce TurBLiMP, the 1st benchmark of minimal pairs for free-order, morphologically rich Turkish language!
Pre-print: arxiv.org/abs/2506.13487
Fruit of an almost year-long project by amazing MS student @ezgibasar.bsky.social in collab w/ @frap98.bsky.social and @jumelet.bsky.social
Ik snap niet dat hier niet meer ophef over is:
Het binnenhalen van Amerikaanse wetenschappers wordt betaalt door Nederlandse academici geen inflatiecorrectie op hun salaris te geven.
1/2
Ohh cool! Nice to see the interactions-as-structure idea I had back in 2021 is still being explored!
My paper with @tylerachang.bsky.social and @jamichaelov.bsky.social will appear at #ACL2025NLP! The updated preprint is available on arxiv. I look forward to chatting about bilingual models in Vienna!
“Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models”
I’m happy to share that the preprint of my first PhD project is now online!
🎊 Paper: arxiv.org/abs/2505.23689
"A well-delivered lecture isn’t primarily a delivery system for information. It is an ignition point for curiosity, all the better for being experienced in an audience."
Marvellous defence of the increasingly maligned university experience by @patporter76.bsky.social
thecritic.co.uk/university-a...
Interested in multilingual tokenization in #NLP? Lisa Beinborn and I are hiring!
PhD candidate position in Göttingen, Germany: www.uni-goettingen.de/de/644546.ht...
PostDoc position in Leuven, Belgium:
www.kuleuven.be/personeel/jo...
Deadline 6th of June
BlackboxNLP, the leading workshop on interpretability and analysis of language models, will be co-located with EMNLP 2025 in Suzhou this November! 📆
This edition will feature a new shared task on circuits/causal variable localization in LMs, details here: blackboxnlp.github.io/2025/task
Close your books, test time!
The evaluation pipelines are out, baselines are released & the challenge is on
There is still time to join and
We are excited to learn from you on pretraining and human-model gaps
*Don't forget to fastEval on checkpoints
github.com/babylm/evalu...
📈🤖🧠
#AI #LLMS
Pleased to announce our paper was accepted at ICLR 2025 as a Spotlight! I will present our poster on Saturday April 26, 3-5pm, Poster #241. Hope to see you there!
arxiv.org/abs/2409.19151
Scherp geschreven en geheel mee eens, maar beetje wrang wel dat de boodschap zich achter een paywall van 450 euro bevindt :') (dank voor de screenshots!)
✨ New Paper ✨
[1/] Retrieving passages from many languages can boost retrieval augmented generation (RAG) performance, but how good are LLMs at dealing with multilingual contexts in the prompt?
📄 Check it out: arxiv.org/abs/2504.00597
(w/ @arianna-bis.bsky.social @Raquel_Fernández)
#NLProc
That is definitely possible indeed, and a potential confounding factor. In RuBLiMP, a Russian benchmark, they defined a way to validate this based on LM probs, but we left that open for future work. The poor performance on low-res langs shows they're definitely not trained on all of UD though!