Advertisement · 728 × 90

Posts by Pieter Delobelle

proud to share one of the first projects I worked on at Pleias
in collaboration with NVIDIA: Nemotron-Personas-France

we release an open source synthetic dataset of 1 million French personas, covering the full demographic profile (e.g. age-sex pyramids, income and education, ...).

1 month ago 1 1 0 0
Preview
SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models Margaret Mitchell, Giuseppe Attanasio, Ioana Baldini, Miruna Clinciu, Jordan Clive, Pieter Delobelle, Manan Dey, Sil Hamilton, Timm Dill, Jad Doughman, Ritam Dutt, Avijit Ghosh, Jessica Zosa Forde, Ca...

It will be presented by @mmitchell.bsky.social and the paper can be read here:

aclanthology.org/2025.naacl-l...

11 months ago 5 1 0 0

Proud that our work on multilingual bias evals made it into @wired.com! The paper is being presented today at #NAACL2025.

📃 SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
📅 Session K (2/5 at 12:00) @ Ballroom B

11 months ago 7 0 1 0
Picture of a crowd listening to our awesome introduction of the session “towards tokenizer-free end to end architectures”

Picture of a crowd listening to our awesome introduction of the session “towards tokenizer-free end to end architectures”

If you are at #ICLR25 and care about tokenizers, drop by Aleph Alpha’s Birds of a Feather session – happening now at Opal 103.

11 months ago 3 0 0 0
Preview
The end of GEITje 1 At the pressing request of Stichting BREIN, GEITje is no longer available as of today. All model files have been removed from my HuggingFace repositories1. GEITje was a Dutch-language large open langu...

So while I believe our use for tweety (and even my RobBERT model trained in 2019) is well within the law, it is a worrying precedent set by Brein.

geitje’s blog post here: goingdutch.ai/en/posts/gei...

1 year ago 0 0 0 0

.. instead of uni-backed Dutch LLMs like Fietje-2b by @bramvanroy.bsky.social (KUL) or our tweety-7b-dutch (KUL & UGent).

How copyright applies to LLMs is not so clearcut (it protects works from unauthorised distribution), since LLMs do not repeat training data unless severely oversampled.

1 year ago 1 0 1 0

I just found out that Stichting Brein took down GEITje, a Dutch 7B LLM made by Edwin Rijgersberg as a hobby project.

While its training corpus was indeed copyrighted (Gigacorpus), it is interesting that Brein went after a hobby project first.. 🧵 1/3

1 year ago 1 0 1 0
Preview
Parallia/Fairly-Multilingual-ModernBERT-Embed-BE · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Not super multilingual, but for Dutch, German, French and English (all Belgian languages 🇧🇪) there is is this variant: huggingface.co/Parallia/Fai...

1 year ago 1 0 0 0
academic poster presenting the results of the research project.

academic poster presenting the results of the research project.

TweetyIta and ItaEval are a language model and evaluation benchmark for Italian tasks. What's more, they are 100% community-driven and born within RiTA (rita-nlp.org). @asantilli.bsky.social will present the poster on Dec 5, 16:30-17:30.

+ Pieter Delobelle, Moreno La Quatra, @bsavoldi.bsky.social

1 year ago 6 4 1 1
Advertisement
Preview
Computer Science Conference Deadlines Map Interactive world map of Computer Science, AI, and ML conference deadlines

Unsure where to submit your next research paper to now that aideadlin.es is not updated anymore? And let’s be honest, is the location not as important as the conference itself?

🗺️ Check out my latest side-project: deadlines.pieter.ai

1 year ago 13 4 0 0

Meet our researchers from the DTAI lab at KU Leuven!

Using this starter pack, you can keep up with all the AI research from our PhD students, post-docs, professors and alumni 🦋

1 year ago 18 8 3 0