Advertisement · 728 × 90

Posts by Thomas A. Berrueta

Preview
GitHub - MurpheyLab/MaxDiffRL Contribute to MurpheyLab/MaxDiffRL development by creating an account on GitHub.

This is work done in collaboration with Todd Murphey and Allison Pinosky, who took the lead on the algorithmic implementation side of things. You can find her excellent work in the repository below.

GitHub repo:
github.com/MurpheyLab/M...

2 years ago 2 0 0 0
Preview
MaxDiff RL Supplementary Movies Supplementary movies for maximum diffusion reinforcement learning manuscript.

Here, we elucidate (and in many cases overcome) the role that violations of the i.i.d. property and ergodicity play in many of Deep RL's shortcomings when deployed in embodied systems. There's more to say but for now please enjoy our preprint.

SI movies:
www.youtube.com/playlist?lis...

2 years ago 2 0 1 0
Three panel image representing the third figure of our linked paper. Panel A depicts the MuJoCo swimmer environment, as well as one of the many formulations of the MaxDiff RL objective function. The MaxDiff RL objective seeks to balance diffusive exploration with task exploitation through a temperature parameter, alpha. Panel B depicts the role of tuning alpha on system performance, showing that optimal performance is attained when there is substantial exploration without breaking ergodicity. Panel C depicts episodic reward curves for MaxDiff RL, SAC, and NN-MPPI, on the MuJoCo swimmer environment. MaxDiff RL robustly attains state-of-the-art performance with near zero variance across 10 random seeds.

Three panel image representing the third figure of our linked paper. Panel A depicts the MuJoCo swimmer environment, as well as one of the many formulations of the MaxDiff RL objective function. The MaxDiff RL objective seeks to balance diffusive exploration with task exploitation through a temperature parameter, alpha. Panel B depicts the role of tuning alpha on system performance, showing that optimal performance is attained when there is substantial exploration without breaking ergodicity. Panel C depicts episodic reward curves for MaxDiff RL, SAC, and NN-MPPI, on the MuJoCo swimmer environment. MaxDiff RL robustly attains state-of-the-art performance with near zero variance across 10 random seeds.

❗ Posting my new preprint as my first post ❗

Maximum Diffusion Reinforcement Learning

arxiv.org/abs/2309.15293

MaxDiff RL is a generalization of MaxEnt RL with provable seed-invariance and single-shot learning guarantees, built from the ground up to consider agent embodiment in decision making.

2 years ago 7 1 1 0