Hierarchical Preference Learning Improves Long‑Horizon LLM Agents
Hierarchical Preference Learning adds a group‑level objective between trajectory‑ and step‑level DPO, using a curriculum that scales to complex sub‑task groups. Read more: getnews.me/hierarchical-preference-... #hierarchicalpreferencelearning #llmagents
0
0
0
0