Policy Gradient Method Advances Multi‑Objective Reinforcement Learning
An algorithm for multi‑objective RL maximizes a concave utility and reaches an ε‑neighborhood after O(M⁴σ²/(1‑γ)⁸ε⁴) trajectories (arXiv:2105.14125). Read more: getnews.me/policy-gradient-method-a... #multigoalrl #policygradient
1
0
0
0