Advertisement Β· 728 Γ— 90

Posts by Arij Riabi

πŸŽ‰ ALMAnaCH will be at EACL 2026 and co-located workshops this week! πŸŽ‰
πŸ‘ Congratulations to all authors and collaborators for their contributions on responsible and multilingual NLP, covering bias, misinformation, sociocultural analysis, and language modelling across diverse languages and domains.

4 weeks ago 4 3 1 0
Post image

Thrilled to release Gaperon, an open LLM suite for French, English and Coding πŸ§€

We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data

(TLDR: we cheat and get good scores)

@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social

5 months ago 34 18 1 4
Preview
Can We Fix Social Media? Testing Prosocial Interventions using Generative Social Simulation Social media platforms have been widely linked to societal harms, including rising polarization and the erosion of constructive debate. Can these problems be mitigated through prosocial interventions?...

We built the simplest possible social media platform. No algorithms. No ads. Just LLM agents posting and following.

It still became a polarization machine.

Then we tried six interventions to fix social media.

The results were… not what we expected.

arxiv.org/abs/2508.03385

8 months ago 301 106 14 44

I am stuck at just hot summer haha

10 months ago 2 0 1 0

ModernBERT or DeBERTaV3?

What's driving performance: architecture or data?

To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects.

Here are our findings:

1 year ago 43 15 3 0
PhD defence of Arij Riabi, 18 March 2025

PhD defence of Arij Riabi, 18 March 2025

Congratulations to @arijriabi.bsky.social who successfully defended her PhD β€œSmall is Beautiful: Addressing Resource Scarcity, Language Variation, & Transfer Challenges for Automatic Detection of Harmful Language” last Tuesday, supervised by @zehavoc.bsky.social & @openlaurent.bsky.social πŸ‘©β€πŸŽ“πŸŽ‰

1 year ago 21 3 0 0

Haha no stil didn't get my yoyo (yet)

1 year ago 2 0 0 0
Advertisement

Hahahah yes I arrived at 1 am they were all half asleep but we still celebrated.

1 year ago 1 0 1 0
Preview
a man wearing a tie and a blue shirt is screaming in a kitchen ALT: a man wearing a tie and a blue shirt is screaming in a kitchen
1 year ago 0 0 1 0

A special thank you to my colleagues at ALMAnaCh @inriaparisnlp.bsky.social and everyone who has been part of this journey.

#PhD #NLP #research

1 year ago 4 0 1 0

I am deeply grateful to my supervisors, @zehavoc.bsky.social and @openlaurent.bsky.social , as well as my committee members, Elena Cabrio, Sara Tonelli, Benjamin Piwowarski and @marinecarpuat.bsky.social for their valuable feedback and support.

1 year ago 3 0 1 0
Post image

I am excited to share that I have successfully defended my PhD, "Addressing Resource Scarcity, Language Variation, and Transfer Challenges for Automatic Detection of Harmful Language." πŸŽ‰
πŸ‘©β€πŸŽ“πŸ‘©β€πŸŽ“πŸŽ‰
@inriaparisnlp.bsky.social
@sorbonne-universite.fr

1 year ago 32 0 4 1

πŸŽ‰ 🌍✍️ I'm thrilled to announce that our paper, "Common Ground, Diverse Roots: The Difficulty of Classifying Common Examples in Spanish Varieties", co-authored with @arijriabi.bsky.social and @zehavoc.bsky.social, has been accepted for the #VarDial2025 workshop during #COLING2025! πŸŽ‰ 1/5

1 year ago 6 2 1 0

most people want a quick and simple answer to why AI systems encode/exacerbate societal and historical bias/injustice and due to the reductive but common thinking of "bias in, bias out," the obvious culprit often is training data but this is not entirely true

1/

1 year ago 598 217 26 42
Advertisement
Preview
HTR-United HTR-United is a catalog and an ecosystem for sharing and finding ground truth for optical character or handwritten text recognition (OCR/HTR).

Now that I am on bluesky, let me take you again on a threaded tour of HTR-United (#HTR_United), a project founded and led by @ponteineptique.bsky.social and I since September 2021. Its main goal is to facilitate finding and sharing open datasets to train HTR and OCR models!

htr-united.github.io

2 years ago 4 5 1 0