Length‑Aware Sampling Boosts Policy Optimization for LLM Reasoning
Length-aware Sampling for Policy Optimization (LSPO) is a meta-RLVR method that uses response length to curb overthinking, cutting token count. The pre-print was submitted on 1 Oct 2025. getnews.me/length-aware-sampling-bo... #lspo #rlvr
0
0
0
0