Off‑Policy Max‑Entropy RL with Future Visitation Rewards
ArXiv posted December 2024 proposes an intrinsic reward via KL‑divergence of future state‑action visitation versus a reference, enabling off‑policy learning from replay buffers. getnews.me/off-policy-max-entropy-r... #maxentropyrl #offpolicy
0
0
0
0