This is work done in collaboration with Todd Murphey and Allison Pinosky, who took the lead on the algorithmic implementation side of things. You can find her excellent work in the repository below.
GitHub repo:
github.com/MurpheyLab/M...
Posts by Thomas A. Berrueta
Here, we elucidate (and in many cases overcome) the role that violations of the i.i.d. property and ergodicity play in many of Deep RL's shortcomings when deployed in embodied systems. There's more to say but for now please enjoy our preprint.
SI movies:
www.youtube.com/playlist?lis...
Three panel image representing the third figure of our linked paper. Panel A depicts the MuJoCo swimmer environment, as well as one of the many formulations of the MaxDiff RL objective function. The MaxDiff RL objective seeks to balance diffusive exploration with task exploitation through a temperature parameter, alpha. Panel B depicts the role of tuning alpha on system performance, showing that optimal performance is attained when there is substantial exploration without breaking ergodicity. Panel C depicts episodic reward curves for MaxDiff RL, SAC, and NN-MPPI, on the MuJoCo swimmer environment. MaxDiff RL robustly attains state-of-the-art performance with near zero variance across 10 random seeds.
❗ Posting my new preprint as my first post ❗
Maximum Diffusion Reinforcement Learning
arxiv.org/abs/2309.15293
MaxDiff RL is a generalization of MaxEnt RL with provable seed-invariance and single-shot learning guarantees, built from the ground up to consider agent embodiment in decision making.