proud to share one of the first projects I worked on at Pleias
in collaboration with NVIDIA: Nemotron-Personas-France
we release an open source synthetic dataset of 1 million French personas, covering the full demographic profile (e.g. age-sex pyramids, income and education, ...).
Posts by Pieter Delobelle
It will be presented by @mmitchell.bsky.social and the paper can be read here:
aclanthology.org/2025.naacl-l...
Proud that our work on multilingual bias evals made it into @wired.com! The paper is being presented today at #NAACL2025.
📃 SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
📅 Session K (2/5 at 12:00) @ Ballroom B
Picture of a crowd listening to our awesome introduction of the session “towards tokenizer-free end to end architectures”
If you are at #ICLR25 and care about tokenizers, drop by Aleph Alpha’s Birds of a Feather session – happening now at Opal 103.
So while I believe our use for tweety (and even my RobBERT model trained in 2019) is well within the law, it is a worrying precedent set by Brein.
geitje’s blog post here: goingdutch.ai/en/posts/gei...
.. instead of uni-backed Dutch LLMs like Fietje-2b by @bramvanroy.bsky.social (KUL) or our tweety-7b-dutch (KUL & UGent).
How copyright applies to LLMs is not so clearcut (it protects works from unauthorised distribution), since LLMs do not repeat training data unless severely oversampled.
I just found out that Stichting Brein took down GEITje, a Dutch 7B LLM made by Edwin Rijgersberg as a hobby project.
While its training corpus was indeed copyrighted (Gigacorpus), it is interesting that Brein went after a hobby project first.. 🧵 1/3
Not super multilingual, but for Dutch, German, French and English (all Belgian languages 🇧🇪) there is is this variant: huggingface.co/Parallia/Fai...
academic poster presenting the results of the research project.
TweetyIta and ItaEval are a language model and evaluation benchmark for Italian tasks. What's more, they are 100% community-driven and born within RiTA (rita-nlp.org). @asantilli.bsky.social will present the poster on Dec 5, 16:30-17:30.
+ Pieter Delobelle, Moreno La Quatra, @bsavoldi.bsky.social
Unsure where to submit your next research paper to now that aideadlin.es is not updated anymore? And let’s be honest, is the location not as important as the conference itself?
🗺️ Check out my latest side-project: deadlines.pieter.ai
Meet our researchers from the DTAI lab at KU Leuven!
Using this starter pack, you can keep up with all the AI research from our PhD students, post-docs, professors and alumni 🦋