Advertisement · 728 × 90

Posts by Antonin Raffin

Noodling on some lab documentation for LLM usage but high level:
1. LLMs still make bad design decisions. Do not let them make design choices.
2. You’re still responsible for your code correctness. This means reading a lot of code. Hence, bullet point 1.

12 hours ago 76 3 4 0
Preview
Friends Don't Let Friends Use Ollama | Sleeping Robots Ollama gained traction by being the first easy llama.cpp wrapper, then spent years dodging attribution, misleading users, and pivoting to cloud, all while riding VC money earned on someone else's engi...

sleepingrobots.com/dreams/stop-... DramIA :

5 days ago 6 1 0 0
What Matters for Simulation to Online Reinforcement Learning on Real Robots We investigate what specific design choices enable successful online reinforcement learning (RL) on physical robots. Across 100 real-world training runs on three distinct robotic platforms, we systematically ablate algorithmic, systems, and experimental decisions that are typically left implicit in prior work. We find that some widely used defaults can be harmful, while a set of robust, readily adopted design choices within standard RL practice yield stable learning across tasks and hardware. These results provide the first large-sample empirical study of such design choices, enabling practitioners to deploy online RL with lower engineering effort.

What Matters for Simulation to Online Reinforcement Learning on Real Robots

"We investigate what specific design choices enable successful online reinforcement learning (RL) on physical robots."

arxiv.org/abs/2602.20220

1 week ago 8 2 0 0
Preview
Release v2.8.0: Dropped Python 3.9, added Python 3.13 support, MaskablePPO bug fix, default hyperparams for unlisted env in the RL Zoo, Markdown doc · DLR-RM/stable-baselines3 Breaking Changes: Removed support for Python 3.9, please upgrade to Python >= 3.10 Set strict=True for every call to zip(...) Switched to pygame-ce when installing extras New Features: Added off...

Stable-Baselines3 v2.8.0 comes with a fix for a long-standing (4 years!) bug in `MaskablePPO`, as well as default hyperparameters for unlisted environments in the RL Zoo and additional quality-of-life improvements.
Also, our documentation is now fully in Markdown!

github.com/DLR-RM/stabl...

2 weeks ago 6 0 1 0
Preview
Gittensor | Autonomous Software Development The workforce for open source. Compete for rewards by contributing quality code to open source repositories.

Gittensor is paying crypto for merged OSS PRs and it’s generating slop contributions to repos listed on their platform without maintainer consent.

If you maintain an open source project, it's probably worth checking if you’re listed and requesting removal: gittensor.io/repositories

3 weeks ago 12 7 1 0
Preview
The Story of Python's Lazy Imports: Why It Took Three Years and Two Attempts From PEP 690's rejection to PEP 810's unanimous acceptance — how Python finally got explicit lazy imports after three years of real-world production evidence and a fundamental design inversion

The Story of Python's Lazy Imports: why it tooks 3 years and 2 attempts to have a "lazy" Keyword coming in version 3.15 #Python techlife.blog/posts/the-st...

3 weeks ago 8 1 0 2
Usage and Invocations — pytest documentation

TIL: when using PyTest, you can get a list of the slowest ten test durations over 1.0s long: `pytest --durations=10 --durations-min=1.0`, pass `--durations=0 -vv` to show all durations.

See docs.pytest.org/en/6.2.x/usa...

4 weeks ago 7 0 0 0
Preview
Application | RLSS 2026 Apply to RLSS 2026 in Milan and find admission details, deadlines, fees, scholarships, and the registration process for the summer school.

One week left to apply to the Reinforcement Learning Summer School (Milan, 3-12 June 2026).
Don't miss this opportunity to dive deep into RL foundations and learn about the most recent applications!
rlsummerschool.com/application/

1 month ago 10 7 0 0
Advertisement
Preview
Nobody Gets Promoted for Simplicity We reward complexity and ignore simplicity. In interviews, design reviews, and promotions. Here’s how to fix it.

You can't write a compelling promotion packet about the thing you didn't build. And that's the whole problem.

terriblesoftware.org/2026/03/03/n...

1 month ago 17 6 2 1
Research Scientist, Reinforcement Learning London, UK

DeepMind's RL team is hiring a research scientist: if you're passionate about RL, come work with us!

And if you know people who might be interested, please share:
job-boards.greenhouse.io/deepmind/job...

1 month ago 28 14 1 0
RLSS'26 — Milan

If you are in Europe, applications for the 2026 Reinforcement Learning Summer School are now open!

Location: Milan
Date: June 3–12, 2026
Website: rlsummerschool.com

1 month ago 3 0 0 0

folks, please don't submit LLM-generated PRs to open source projects. It makes no sense.

If the maintainers want to use an LLM to fix an issue, they can use Claude or whatnot directly. They don't need you as intermediary, that's just silly.

If they don't want to use LLMs, they have reasons.

1 month ago 70 13 0 0
Preview
Switch to Markdown documentation (MyST parser) by araffin · Pull Request #2219 · DLR-RM/stable-baselines3 Description You can see the doc here: https://stable-baselines3.readthedocs.io/en/md-doc/ Should be identical to the rst one (I use the auto migrate tool and then fixed errors manually). For examp...

Thanks to the MyST parser and rst-to-myst, you can easily convert your documentation to Markdown while still keeping all the features of Sphinx =)

github.com/DLR-RM/stabl...

2 months ago 1 0 0 0

bsky.app/profile/kahn...

2 months ago 1 0 1 0
Recent Advances in RL for Continuous Control (SOTA 2025) | CERN ML Workshop
Recent Advances in RL for Continuous Control (SOTA 2025) | CERN ML Workshop YouTube video by Antonin Raffin

The talk i gave about "Recent Advances in RL for Continuous Control" at CERN last year is now online =)

www.youtube.com/watch?v=Sb0d...

2 months ago 16 4 1 1
Advertisement
Implement MPO · Issue #9 · Stable-Baselines-Team/stable-baselines3-contrib Maximum a Posteriori Policy Optimisation (MPO) Reference implementation: https://github.com/deepmind/acme PyTorch implementation: https://github.com/fabiopardo/tonic

github.com/Stable-Basel...
contributions are welcomed =) (the issue is from 2020...)

mainly lack of time, clean and readable implementation (and benchmark/comparison to other algos)

2 months ago 2 0 1 0

reppo is in the updated version (in the references), PQL might be what you are looking for? (should be in the references too)
For mpo, i need to re-read the paper and try to implement and benchmark.

2 months ago 0 0 1 0
Six Things I Learned Watching a Robotics Startup Die from the Inside | Rui Xu I spent a year as COO of a YC-backed robotics startup. The company didn't make it. Here's what I actually learned.

nice blog post about a humanoid robotics startup failure: ruixu.us/posts/six-th...

2 months ago 10 3 0 0
Recent Advances in Reinforcement Learning for Continuous Control | SOTA Early 2026

I also updated the slides recently for the RL Mannheim Workshop to include new SOTA algorithms from early 2026

araffin.github.io/slides/advan...

2 months ago 2 0 0 0
Recent Advances in RL for Continuous Control (SOTA 2025) | CERN ML Workshop
Recent Advances in RL for Continuous Control (SOTA 2025) | CERN ML Workshop YouTube video by Antonin Raffin

The talk i gave about "Recent Advances in RL for Continuous Control" at CERN last year is now online =)

www.youtube.com/watch?v=Sb0d...

2 months ago 16 4 1 1
To understand how our radio buttons work I need to understand two separate component libraries and hundreds of lines of React.

To understand how our radio buttons work I need to understand two separate component libraries and hundreds of lines of React.

If you missed this post last week, it explains pretty well how modern frontend works these days. :/

https://paulmakeswebsite

2 months ago 19 2 0 0

Q-value overestimation animation for my upcoming talk about "Recent Advances in RL for Continuous Control" at the Mannheim RL Workshop

2 months ago 3 1 1 0
Preview
The Formalism-Implementation Gap in Reinforcement Learning Research The last decade has seen an upswing in interest and adoption of reinforcement learning (RL) techniques, in large part due to its demonstrated capabilities at performing certain tasks at "super-human l...

This is something I talk about in my paper, where I suggest being explicit about {\gamma}_train (some methods use multiple gammas during training) and \gamma_eval.
One of my students is empirically investigating this and, as one would expect, it can have a huge impact.

arxiv.org/abs/2510.16175

2 months ago 12 1 1 1
Servo 0.0.4 showing new support for multiple windows

Servo 0.0.4 showing new support for multiple windows

December in Servo…

🎤🧑‍🏫 FOSDEM talks next week!
🤹🪟 multiple windows
🪆🌐 HTTP proxy support
🔐🕵️ more SubtleCrypto algorithms
💽🗃️ new site data & network API

servo.org/blog/2026/01...

2 months ago 45 8 2 0
Advertisement
Preview
Docker Cheat Sheet — The Ultimate CLI Reference Comprehensive Docker CLI reference with commands for containers, images, volumes, networks, Compose, and Dockerfile.

Dr. Who plays with Docker How :
docker.how

3 months ago 43 9 2 0
The export and preview menu, with the "PDF" section unfolded.

The export and preview menu, with the "PDF" section unfolded.

HTML preview & export now available in the web app! With HTML export, you can create a website from the same Typst file as your PDFs. This makes it easy to create documents that feel just as at home on the web as they do in print.

3 months ago 50 6 1 1
Post image

This network analyzer is very efficient and allows you to find interesting accounts, eg. people followed by lots of the people you follow (but not you).

bsky-follow-finder.theo.io

(Reposting this for folks who have joined Bsky more recently)

3 months ago 17 9 2 0
Preview
Open Source projects Join the conversation

People wanted our Open Source Organizations starter pack to include many projects, so we decided to give them their own starter pack.
go.bsky.app/HvKFRKa

3 months ago 29 6 2 0

"uv is fast because of what it doesn’t do, not because of what language it’s written in"

3 months ago 5 0 0 0

Using AI coding for data analysis without personal programming skill fills me with dread.

Small errors in the code poisons results in ways that may not be visibly obvious.

LLMs are great when people verify outputs; the path to hell is when they don't.

3 months ago 24 4 1 2