Pedro Santos (@pedrosantospps) Bsky

We provide experimental results showing that our MCTS-based approach to solve GUMDPs in the single-trial regime is successful in tasks such as exploration, imitation learning or adversarial MDPs.

N/N

2 months ago 1 0 0 0

Then, we explore how online planning techniques can be used to solve GUMDPs in the single-trial regime. In particular, we show that we can use an MCTS algorithm to provably solve GUMDPs in the single-trial regime.

2 months ago 1 0 1 0

In our work, under the discounted infinite-horizon setting, we first provide fundamental results for policy optimization in the single-trial regime. We show that non-Markovianity matters, connect single-trial optimization with solving a particular MDP, and prove a hardness result.

2 months ago 1 0 1 0

In particular, the optimal policy for the single-trial regime can differ from the optimal policy for the multiple-trial regime. This is unfortunate since the single-trial regime is important in the real-world, where policy performance is usually assessed based on a single trajectory.

2 months ago 1 0 1 0

However, previous works (jmlr.org/papers/volum..., arxiv.org/pdf/2409.15128) pointed out that the performance of a policy depends on the number of trials/trajectories drawn to evaluate its performance.

2 months ago 1 0 1 0

GUMDPs generalize the MDP framework by allowing the performance of a given policy to depend on a (possibly non-linear) function of the frequency of visitation of state-action pairs induced by the policy.

2 months ago 1 0 1 0

Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning In this work, we contribute the first approach to solve infinite-horizon discounted general-utility Markov decision processes (GUMDPs) in the single-trial regime, i.e., when the agent's performance is...

Our work, "Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning", got accepted to ICLR 2026.

arxiv.org/abs/2505.15782

1/N

Joint work with Francisco S. Melo and Alberto Sardinha.

2 months ago 1 0 1 0

Here’s Pedro at yet another international conference! 🙌✨
GAIPS member Pedro P. Santos presented “Centralized training with hybrid execution in multi-agent reinforcement learning via predictive observation imputation” at #AAAI2026, Singapore 🇸🇬
📄 Check out his paper: doi.org/10.1016/j.ar...

2 months ago 1 1 0 0

Here’s some photos of GAIPS member @pedrosantospps.bsky.social presenting his work on ICML 2025 in Vancouver and EWRL 2025 in Tübingen, Germany. His poster was selected as a "spotlight poster" (top 2.6% of the papers)! 🙌 Read his work here: icml.cc/virtual/2025...

6 months ago 1 1 0 0

Walking around posters at @icmlconf.bsky.social, I was happy to see some buzz around convex RL—a topic I’ve worked on and strongly believe in.

Thought I’d share a few ICML papers on this direction. Let’s dive in👇

But first… what is convex RL?

🧵

1/n

8 months ago 5 1 1 1

The paper can be found here: arxiv.org/pdf/2409.15128

11 months ago 0 0 0 0

We provide lower and upper bounds on the mismatch between the finite and infinite trials formulations for GUMDPs, as well as empirical results to support our claims, highlighting how the number of trajectories and the structure of the underlying GUMDP influence policy evaluation.

11 months ago 0 0 1 0

We show that the number of trials plays a key role in infinite-horizon GUMDPs, and the expected performance of a given policy depends, in general, on the number of trials.

11 months ago 0 0 1 0

We contribute the first analysis on the impact of the number of trials, i.e., the number of randomly sampled trajectories, in infinite-horizon GUMDPs (considering both discounted and average formulations).

11 months ago 0 0 1 0

The general-utility Markov decision processes (GUMDPs) framework generalizes the MDPs framework by considering objective functions that depend on the frequency of visitation of state-action pairs induced by a given policy.

11 months ago 1 0 1 0

Happy to share that our paper "The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes" got accepted as a spotlight poster at the International Conference on Machine Learning (ICML).

11 months ago 5 1 2 0

Posts by Pedro Santos