Vincent Francois-Lavet (@vinfl) Bsky

OpenAI releases a free GPT model that can run right on your laptop GPT-OSS is OpenAI’s first open-weight model in six years.

NEW: OpenAI is releasing two free open models today, ahead of the GPT-5 launch. One of the open-weight "GPT-OSS" models is small enough to run on a laptop. More from @alexeheath.com 👇 www.theverge.com/openai/71878...

8 months ago 47 5 1 0

According to new research by waymo, self driving cars neural nets perform better according to power scaling laws. More data and compute = better performance. waymo.com/blog/2025/06...

10 months ago 64 8 8 2

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT)...

Major reasoning models trained w RL so far with technical reports:

2025-01-22 — DeepSeek R1 — arxiv.org/abs/2501.12948
2025-01-22 — Kimi 1.5 — arxiv.org/abs/2501.12599
2025-03-31 — Open-Reasoner-Zero — arxiv.org/abs/2503.24290
2025-04-10 — Seed 1.5-Thinking — arxiv.org/abs/2504.13914
...

10 months ago 48 8 2 2

True but looking for data to back up what "you believe" is already a good sign, right? It seems better than trying to claim things without anything to back the claims. And at least, you can then argue with people who disagree based on the data on a scientific basis.

10 months ago 3 0 1 0

Interested in more insights about the progress of AI, you can check out these two sources:
www.bondcap.com/report/tai
www.ben-evans.com/presentations

10 months ago 0 0 0 0

State of AI in 4 plots.

The 200 ELO points difference between recent models and a model that is 2 years old means that a human rater has ~75% chance of preferring an answer from a recent model.

Based on available data, all indicators about the progress of AI (in particular LLMs) remain strong.

10 months ago 3 0 1 0

Not long ago, people laughed at the idea of AI generating minutes-long realistic videos. Now it's reality with tools like Sora and Veo 3 leading the way. Full movies in cinemas soon, generated from just a few prompts...

10 months ago 2 0 1 0

Shoutout to the creators of PQN and for the cleanRL baselines.

10 months ago 0 0 0 0

My co-authors: Jacob Kooi and Zhao Yang
Paper: arxiv.org/abs/2505.15345
Codebase: github.com/Jacobkooi/Ha...

10 months ago 0 0 1 0

Directly implementing the Hadamax encoder in other algorithms such as C51 also shows over 60% improvements.

10 months ago 0 0 1 0

The Hadamax architecture can be implemented in any pixel-based encoder. The most important design choices are:

1. Convolutional Hadamard Representations.
2. Max-pooling instead of convolutional down-sampling.
3. Gaussian Error Linear Unit activations.

10 months ago 1 0 1 0

Without changing any algorithmic hyperparameters, this encoder substitution places Hadamax-PQN among state-of-the-art model-free reinforcement learning, while remaining an order of magnitude faster than Rainbow.

10 months ago 1 0 1 0

📢New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345)

Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!

10 months ago 5 0 1 0

Making stock market predictions (especially short/medium term) is tempting but unless you have privileged information, you might as well try predicting random noise. Financial markets are self-adapting systems where any predictable pattern tends to be exploited and arbitraged away by participants.

11 months ago 1 0 0 0

Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.

🔗 magazine.sebastianraschka.com/p/the-state-...

1 year ago 61 10 1 2

This is indeed a great position paper, I like it a lot:
- pre-training w next token prediction creates local minima in reasoning we can't escape => pre-training should also be done with RL
- long context windows lead to exploitation of spurious correleations
- disentangle reasoning and knowledge

1 year ago 21 2 0 0

The funny thing about multimodal image generation as released in the last week by Google and OpenAI is that now LLM image generation works like how most people using LLMs for the past two years always thought LLM image generation works.

1 year ago 77 6 1 0

Introducing Gemini Robotics and Gemini Robotics-ER, AI models designed for robots to understand, act and react to the physical world. Introducing Gemini Robotics and Gemini Robotics-ER, AI models designed for robots to understand, act and react to the physical world.

deepmind.google/discover/blo... !

1 year ago 9 3 0 0

TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science YouTube video by Amii

www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.

1 year ago 40 13 2 1

AI pioneers who channeled 'hedonistic' machines win computer science's top prize Teaching machines in the way that animal trainers mold the behavior of dogs or horses has been an important method for developing artificial intelligence and one that was recognized Wednesday with the...

Congrats Andrew and Rich, well deserved!! apnews.com/article/turi...

1 year ago 6 3 0 0

Ramon Llull AIRA Open Calls Open Calls In our inaugural call scheduled for December 2024, we aim to select up to 17 exceptional postdoctoral fellows, with an additional 16 to be chosen in Call 2 in 2025. 20 December 2024 ...

check this out: new postdoc program for AI-related research in Catalunya!

our group is looking to hire within this program, ideally to work on topics related to RL theory. in case you're interested, pls DM or email me.

(retweets appreciated!)

ramonllull-aira.eu/application

1 year ago 12 10 0 0

How DeepSeek R1's Multi-round Conversation works.

api-docs.deepseek.com/guides/reaso...

1 year ago 12 1 0 0

Bombshell from DeepSeek: the R1 family of models. Incredibly, it's MIT licensed and they encourage us to distill from it.

The core of the approach is reinforcement learning from verifiable rewards. No PRMs / MCTS. R1-zero doesn't even use SFT to start.

1 year ago 8 2 1 0

2024 Robotics Year in Review Robotics finally feels like it's happening

I probably don’t need to tell you that 2024 was a huge year for robotics. As a long-time robotics researcher, it’s been amazing to watch; some of the things that I always dreamed about actually seem to be happening.

For me, there are three big stories: itcanthink.substack.com/p/2024-robot...

1 year ago 35 7 2 3

Super happy to reveal our new paper! 🎉🙌♟️

We trained a model to play four games, and the performance in each increases by "external search" (MCTS using a learned world model) and "internal search" where the model outputs the whole plan on its own!

1 year ago 137 18 4 8

RLDM will be held next year in Dublin!

A reminder that the call for workshops is out: rldm.org/call-for-wor...

The workshops are one of my favourite parts of the conference :) please get in touch if you have any questions!

1 year ago 42 15 1 0

Hello, world! You seem a bit wilder than I expected, but here we are.

1 year ago 11 0 3 0

Posts by Vincent Francois-Lavet