Step-Aware Policy Optimization Improves Reasoning in Diffusion LLMs
SAPO adds step‑level rewards to diffusion language models, aligning each denoising iteration with a hierarchical reasoning plan and boosting benchmark performance. Read more: getnews.me/step-aware-policy-optimi... #diffusionllm #sapoinnovation
0
0
0
0