A little clippie from the impromptu REPPO stream I was dragged into.
www.twitch.tv/shale_lee/cl...
#Vtuber #Reppo #Clip
Big if true ๐คซ: #REPPO works on Atari as well ๐ฑ ๐พ ๐
Some tuning is still needed, but we are seeing results roughly on par with #PQN.
If you want to test out #REPPO (atari is not integrated due to issues with envpool and jax version), check out github.com/cvoelcker/re...
#reinforcementlearning
GIF showing two plots that symbolize the REPPO algorithm. On the left side, four curves track the return of an optimization function, and on the right side, the optimization paths over the objective function are visualized. The GIF shows that monte-carlo gradient estimators have a high variance and fail to converge, while surrogate function estimators converge smoothly, but might find suboptimal solutions if the surrogate function is imprecise.
๐ฅ Presenting Relative Entropy Pathwise Policy Optimization #REPPO ๐ฅ
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019