Al_th (@alth.fr) Bsky

Oups j’arrive en retard… merci 😅

2 months ago 0 0 0 0

Merci du retour ! Tu as un lien pour le parrainage ?

4 months ago 0 0 1 0

Poivre blanc du penja, c’est un de mes préféré si jamais tu as l’occasion

4 months ago 1 0 1 0

Que vaut la vanille ?

J’ai vu les mêmes pub sur Instagram et vu le prix j’avoue que j’étais un peu refroidi… j’ai cru à une arnaque

4 months ago 0 0 1 0

Le running, c’est pas du flan (quoique) La multiplication des clubs de course à but gourmand prouve que le réconfort vaut désormais autant que l’effort. Au Running Flan Club, à Paris, on s’épuise sur quelques kilomètres avant de se retrouve...

www.lemonde.fr/m-perso/arti...

"Le flan n’est pas prétentieux comme un macaron ; il est « terroir », mais pas snob comme le pâté en croûte. "

Je vous jure, je suis trigger de fou.

1 year ago 0 0 0 0

🔥🔥🔥 CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) 🥐🍾🥖🍷

Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...

Big 🧵👇 with details!

1 year ago 136 51 7 11

A bit frustrated by how arXiv accounts are integrated in the #MLSky feed.

Endless scrolling of links without context is uninformative, and just leads to me to ignore them all.

I can block but is this really a good route…

1 year ago 1 1 0 0

Vivre… vivre… c’est un grand mot.

Y’a des jours c’est de la survie 🤣

1 year ago 0 0 0 0

Crossing the uncanny valley of conversational voice At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued.

Impressive.

I really like the fact that you can interrupt. It's always difficult to speak with an AI algorithm, or even do speech to text as the moment you stop talking, it's the algorithm's "turn".

IRL we do not play turn-based, it's much more subtle than that.

1 year ago 0 0 0 0

A bit frustrated by how arXiv accounts are integrated in the #MLSky feed.

Endless scrolling of links without context is uninformative, and just leads to me to ignore them all.

I can block but is this really a good route…

1 year ago 1 1 0 0

Il faut les comprendre. Le MO coûte cher ! 😂

1 year ago 0 0 0 0

Il y a une discussion entre deux personnes dans cette salle.

Celle qui porte la culotte, si vous me pardonnez cette expression un peu datée, n’est pas celle que vous croyez.

Go girl !

1 year ago 1 0 0 0

Some time ago, I DM'd @dorialexander.bsky.social about a similar (yet somewhat diff) idea :

While there is a point in fixing the generated tokens, we do squash enormous amount of information by actually looking if the cat is dead or alive.

AFAIK, the issue with diff is fixed context size tho

1 year ago 1 0 0 0

An arguably "easy to read" simple GRPO implementation, for teaching purpose

#MLSky

1 year ago 1 0 0 0

It’s really funny to me that the hottest RL algorithm in town is just a simplification (z-score normalization for advantage calculation) of a simplification (KL penalization over hard KL constraint).

GRPO is quite intuitive, although I guess the devil is in the details and « convergence » speeds

1 year ago 1 0 0 0

The new policy logprob computation seems a bit clunky for now.

It's currently generic enough to use any generation length in the grpo output generation step, but I guess it would be much more efficient to generate only a context size chunk and use the fact that you have the full logits available...

1 year ago 2 0 0 0

GitHub - Al-th/grpo_experiment: Experiment on reimplementation of GRPO RL Experiment on reimplementation of GRPO RL . Contribute to Al-th/grpo_experiment development by creating an account on GitHub.

github.com/Al-th/grpo_e...

I hope it's a reasonable implementation...

Tokenizer and Transformer models are very naive, based on Karpathy's transformer from scratch video. Data is also based on Karpathy's video.

1 year ago 1 0 1 1

Probably can share that yeah

Needs a bit of cleanup first but I’ll ping you.

1 year ago 1 0 0 0

To be fair, the GRPO optimized model doesnt shout, the RL cheated by having more people speak (as names are capitalized in the dataset I'm using)

(Left is base transformer, right is post GRPO)

1 year ago 1 0 0 0

I implemented GRPO from scratch to RL a tiny toy LLM and it works surprisingly well.

Rule base reward inspired by @dorialexander.bsky.social to make my Shakespeare shout more.

I went for Outcome Supervision as both OS and PS we’re kind of close in DeepseekMath paper…

1 year ago 2 0 2 0

Vu les niveaux de radioactivité rapportés : 2.7->26.4 avec une médiane de 14.4 Bq/kg, honnêtement je suis pas expert mais je pense qu’on peut dire « vu et s’en tape »…

J’ai bien aimé ce passage aussi « These values are 10ˆ8 times lower than levels authorized by EU (55) (3.10−3 mSv day−1) »

1 year ago 12 1 0 0

Poussières sahariennes : la radioactivité ne provient pas des essais nucléaires menés par la France Les poussières désertiques représentent la première source mondiale en masse d’aérosols dans l’atmosphère.

2/2

Cette conclusion provient de plusieurs types d'analyses combinées (géochimie, granulométrie, minéralogie des argiles, activités des radionucléides et de leur signature isotopique, rétro-trajectoires des masses d’air...)

Source @cnrs.bsky.social INSU : www.insu.cnrs.fr/fr/cnrsinfo/...

1 year ago 72 12 4 0

I release my first attempts at training a base model with GRPO. In a similar spirit to R0, this colab notebook transforms Pleias-350m into an RL poet without any post-training data, using only reward functions. t.co/tYSp8NYI1s

1 year ago 44 9 1 0

Dans mon job, en interne donc, ça fait deux ans qu’une décision doit être prise. Je craque.

Et pendant ce temps, obviously, le contexte change, les concurrents avancent, ect…

1 year ago 1 0 1 0

Is it really challenging conventional AI wisdom though ?

It is know for quite a bit of time that training data quality is one of the most important factor when working with supervised algorithms, even though the real world data might be noisy.

Isn’t it the same but in the RL environment ?

1 year ago 0 0 0 0

Tbh I don’t think any of it is (in case this was what you implied) a shift in cultural behavior.

In my view, it’s more the manifestation of the economical benefit: you are the first, you don’t disclose to keep your advantage. You are not, then open sourcing can hurt the top player.

1 year ago 0 0 0 0

Ping ! Au rapport ;)

1 year ago 1 0 0 0

Oui sans doute… mais pour 15 ordonnances par an, c’est vraiment 😵

1 year ago 0 0 0 0

Je suis mort de rire : un médecin retraité >>>>n’exerçant plus aucune activité médicale rémunérée <<<< doit toujours filer ses 100 balles a l’ordre des médecins 😂

1 year ago 0 0 1 0

Vraiment il sera intéressant de voir un peu les algos de "X". Notre compte est interdit de publier des "notes de la communauté". Nous aurions trop de statut "inutile". Pourtant la 2ème capture prouve que c'est faux; et nous sommes allé vérifier.
Conclusion? Les algos de X manipulent les résultats.

1 year ago 84 22 5 3

Posts by Al_th