Most TD(0) papers: keep the cat in a box, assume iid data, tiny steps, resets, data dropping, etc.
Ours: let the cat out. Nonlinear approximation, dependent data, real-world dynamics and it still finds the value function.
arxiv.org/pdf/2502.05706
#reinforcementlearning #catsofML #TDzero
2
0
0
0