Advertisement ยท 728 ร— 90
#
Hashtag
#REPPO
Advertisement ยท 728 ร— 90
Preview
Twitch Twitch is the world

A little clippie from the impromptu REPPO stream I was dragged into.
www.twitch.tv/shale_lee/cl...
#Vtuber #Reppo #Clip

1 0 0 0

Big if true ๐Ÿคซ: #REPPO works on Atari as well ๐Ÿ˜ฑ ๐Ÿ‘พ ๐Ÿš€

Some tuning is still needed, but we are seeing results roughly on par with #PQN.

If you want to test out #REPPO (atari is not integrated due to issues with envpool and jax version), check out github.com/cvoelcker/re...

#reinforcementlearning

7 1 1 0
GIF showing two plots that symbolize the REPPO algorithm. On the left side, four curves track the return of an optimization function, and on the right side, the optimization paths over the objective function are visualized. The GIF shows that monte-carlo gradient estimators have a high variance and fail to converge, while surrogate function estimators converge smoothly, but might find suboptimal solutions if the surrogate function is imprecise.

GIF showing two plots that symbolize the REPPO algorithm. On the left side, four curves track the return of an optimization function, and on the right side, the optimization paths over the objective function are visualized. The GIF shows that monte-carlo gradient estimators have a high variance and fail to converge, while surrogate function estimators converge smoothly, but might find suboptimal solutions if the surrogate function is imprecise.

๐Ÿ”ฅ Presenting Relative Entropy Pathwise Policy Optimization #REPPO ๐Ÿ”ฅ
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019

26 7 2 6