Advertisement · 728 × 90

Posts by Lucas Alegre

Preview
GitHub - LucasAlegre/sumo-rl: Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasium, PettingZoo, and popular RL libraries. Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasium, PettingZoo, and popular RL libraries. - LucasAlegre/sumo-rl

Wow, I just realized SUMO-RL reached 1,000 stars on GitHub! 🥳

I created SUMO-RL while I was an undergrad, getting familiar with RL. Traffic signal control is a very cool real-world problem in which RL shines. I'm glad that the community still benefits from it!

github.com/LucasAlegre/...

1 month ago 1 0 0 0
Post image

It was very fun to present our last paper "Constructing an Optimal Behavior Basis for the Option Keyboard" at NeurIPS this week!

Paper: openreview.net/pdf?id=D4gOo...

#neurips2025

4 months ago 3 0 0 0

Sure, only ~2 weeks to review 5 papers for ICLR. I am sure that all reviewers will have sufficient time to write careful and thoughtful reviews in the following weeks, since they have nothing else to do.

It is insane to expect a fair reviewing system in these terms.

6 months ago 2 0 0 0

It is really cool to see our work on multi-step GPI being cited in this amazing survey! :)

proceedings.neurips.cc/paper_files/...

7 months ago 6 0 0 0

On average I have a good score, but it has happened to me before to have 3/4 reviewers accepting the paper, and 1 negative reviewer convincing the AC to reject.

8 months ago 0 0 0 0

And now I got the classic rebuttal response:

"I have no concerns with the paper, all the theory is great, but since you did not run experiments in expensive domains with image-based environments, I will not increase my score".

The goal of experiments is to validate the claims! Not to beat Atari!

8 months ago 2 0 1 0

I got the classic NeurIPS reviews "why did you not compare with [completely unrelated method whose comparison would not help support any of the paper's claim]?"

Questioning myself whether I should spend my weekend running this useless experiment or if I should argue with the reviewer.

8 months ago 3 0 1 0

Finally, reporting only IQM may compromise scientific transparency and fairness, as it can mask poor or unstable performance. Agarwal et al. (2021), who introduced IQM in this context, recommend using it in conjunction with other statistics rather than as a standalone measure.

10 months ago 1 0 0 0
Advertisement

Yes, Interquartile Mean (IQM) is a robust statistic that reduces the influence of outliers. But it does not by itself provide a clear and fair analysis of performance. In particular, IQM does not capture the full distribution of returns and may hide important information about variability and risk.

10 months ago 1 0 1 0
Deep Reinforcement Learning at the Edge of the Statistical Precipice Our findings call for a change in how we report performance on benchmarks when using only a few runs, for which we present more reliable protocols accompanied with an open-source library.

While I really like the paper "Deep Reinforcement Learning at the Edge of the Statistical Precipice" (openreview.net/forum?id=uqv...), I have seen papers evaluating performance using only the IQM metric and claiming that it is a fairer metric than the mean based on this paper, which is simply wrong.

10 months ago 6 1 1 1

This work was done during my time as an intern at Disney Research Zürich. It was amazing and really fun to develop this idea with the Robotics Team!

10 months ago 2 0 0 0

Check out AMOR now on arXiv:

Paper: arxiv.org/abs/2505.23708
Full Video: youtube.com/watch?v=gQid...

#SIGGRAPH2025 #RL #robotics

10 months ago 2 0 1 0

A base policy with uniform weights might fail on challenging motions, but with a few weight tweaks, it nails them. Like this double spin. 🌀😵‍💫

Curious how tuning weights mid-motion can help improve the sim-to-real gap and unlock dynamic, expressive behaviors?

10 months ago 1 0 1 0

AMOR trains a single policy conditioned on reward weights and motion context, letting you fine-tune the reward after training.
Want smoother motions? Better accuracy? Just adjust the weights — no retraining needed!

10 months ago 1 0 1 0

We are excited to share our #SIGGRAPH2025 paper,

“AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning”!
Lucas Alegre*, Agon Serifi*, Ruben Grandia, David Müller, Espen Knoop, Moritz Baecher

10 months ago 3 0 1 0
AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning
AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning YouTube video by DisneyResearchHub

Annoyed by having to retrain your entire policy just because your reward weights did not quite work on the real robot? 🤖

www.youtube.com/watch?v=gQid...

10 months ago 6 1 1 0
Advertisement

Thank you, Peter! :)

10 months ago 1 0 0 0

I'm really glad to have been selected as one of the ICML 2025 Top Reviewers!

Too bad I won't be able to go since my last submission was not accepted, even with scores Accept, Accept, Weak Accept, and Weak Reject 🫠

10 months ago 6 0 0 0
Post image Post image

Last week, I was at @khipu-ai.bsky.social in Santiago, Chile. It was really amazing to see so many great speakers and researchers from Latin America together!

1 year ago 4 1 0 0
Preview
Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. Andrew Barto and Richard Sutton as the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning...

RL is so back!

(well, for some of us, it never really left)

awards.acm.org/about/2024-t...

1 year ago 72 12 1 1
Preview
A practical guide to multi-objective reinforcement learning and planning - Autonomous Agents and Multi-Agent Systems Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learnin...

Thank you!
link.springer.com/article/10.1...
This paper is a great start point!

1 year ago 1 0 0 0

Thank you! 😊

1 year ago 1 0 0 0

Finally, I would like to thank my advisors, Prof. Ana Bazzan and Prof. Bruno C. da Silva; Prof. Ann Nowé who received me at VUB for my PhD stay; and Disney Research Zürich, where I interned.

I am very grateful to everyone with that I had the chance to collaborate in all such amazing projects! 💙

1 year ago 1 0 0 0

I believe all these contributions open room for many interesting ideas for multi-policy RL methods. Especially in transfer learning (SFs&GPI) and multi-objective RL settings! 🚀

1 year ago 1 0 1 0
Advertisement
Preview
GitHub - Farama-Foundation/MO-Gymnasium: Multi-objective Gymnasium environments for reinforcement learning Multi-objective Gymnasium environments for reinforcement learning - Farama-Foundation/MO-Gymnasium

* MO-Gymnasium (github.com/Farama-Found...) is a library of MORL environments; and

* MORL Baselines (github.com/LucasAlegre/...) is a library of MORL algorithms.

Both have become standards in MORL research and have over 100k downloads in the past year!

1 year ago 1 0 1 0

Besides the theoretical and algorithmic contributions, we also introduced an open-source toolkit for MORL research!

NeurIPS D&B 2023 Paper - openreview.net/pdf?id=jfwRL...

1 year ago 1 0 1 0

Next, we further explored how to leverage approximate models of the environment to improve zero-shot policy transfer. Our method, ℎ-GPI, interpolates between model-free GPI and fully model-based planning as a function of the planning horizon ℎ.

NeurIPS 2023 Paper - openreview.net/pdf?id=KFj0Q...

1 year ago 1 0 1 0
Preview
Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization | Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems You will be notified whenever a record that you have chosen has been cited.

Next, we further explored these ideas and introduced two novel MORL algorithms that exploit GPI to increase sample efficiency in MORL: GPI-LS and GPI-PD.

AAMAS'23 paper: tinyurl.com/aamas23

1 year ago 1 0 1 0
Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer In many real-world applications, reinforcement learning (RL) agents might have to solve multiple tasks, each one typically modeled via a reward function. If reward functions are expressed linearly,...

By exploiting these connections, we introduced SFOLS, a method that is capable of constructing a set of policies and combining them via GPI with the guarantee of obtaining the optimal policy for any novel linearly-expressible tasks!

ICML'22 paper: proceedings.mlr.press/v162/alegre2...

1 year ago 1 0 1 0
Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer In many real-world applications, reinforcement learning (RL) agents might have to solve multiple tasks, each one typically modeled via a reward function. If reward functions are expressed linearly,...

It all started when we discovered and introduced connections between Successor Features and multi-objective RL (MORL):

1 year ago 1 0 1 0