π ALMAnaCH will be at EACL 2026 and co-located workshops this week! π
π Congratulations to all authors and collaborators for their contributions on responsible and multilingual NLP, covering bias, misinformation, sociocultural analysis, and language modelling across diverse languages and domains.
Posts by Arij Riabi
Thrilled to release Gaperon, an open LLM suite for French, English and Coding π§
We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data
(TLDR: we cheat and get good scores)
@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social
We built the simplest possible social media platform. No algorithms. No ads. Just LLM agents posting and following.
It still became a polarization machine.
Then we tried six interventions to fix social media.
The results were⦠not what we expected.
arxiv.org/abs/2508.03385
I am stuck at just hot summer haha
ModernBERT or DeBERTaV3?
What's driving performance: architecture or data?
To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects.
Here are our findings:
PhD defence of Arij Riabi, 18 March 2025
Congratulations to @arijriabi.bsky.social who successfully defended her PhD βSmall is Beautiful: Addressing Resource Scarcity, Language Variation, & Transfer Challenges for Automatic Detection of Harmful Languageβ last Tuesday, supervised by @zehavoc.bsky.social & @openlaurent.bsky.social π©βππ
Haha no stil didn't get my yoyo (yet)
Hahahah yes I arrived at 1 am they were all half asleep but we still celebrated.
A special thank you to my colleagues at ALMAnaCh @inriaparisnlp.bsky.social and everyone who has been part of this journey.
#PhD #NLP #research
I am deeply grateful to my supervisors, @zehavoc.bsky.social and @openlaurent.bsky.social , as well as my committee members, Elena Cabrio, Sara Tonelli, Benjamin Piwowarski and @marinecarpuat.bsky.social for their valuable feedback and support.
I am excited to share that I have successfully defended my PhD, "Addressing Resource Scarcity, Language Variation, and Transfer Challenges for Automatic Detection of Harmful Language." π
π©βππ©βππ
@inriaparisnlp.bsky.social
@sorbonne-universite.fr
π πβοΈ I'm thrilled to announce that our paper, "Common Ground, Diverse Roots: The Difficulty of Classifying Common Examples in Spanish Varieties", co-authored with @arijriabi.bsky.social and @zehavoc.bsky.social, has been accepted for the #VarDial2025 workshop during #COLING2025! π 1/5
most people want a quick and simple answer to why AI systems encode/exacerbate societal and historical bias/injustice and due to the reductive but common thinking of "bias in, bias out," the obvious culprit often is training data but this is not entirely true
1/
Now that I am on bluesky, let me take you again on a threaded tour of HTR-United (#HTR_United), a project founded and led by @ponteineptique.bsky.social and I since September 2021. Its main goal is to facilitate finding and sharing open datasets to train HTR and OCR models!
htr-united.github.io