L1 Model Controls Reasoning Length with Reinforcement Learning
Length Controlled Policy Optimization lets the 1.5 B‑parameter L1 model obey a user‑set reasoning length while matching GPT‑4o accuracy under equal token limits. getnews.me/l1-model-controls-reason... #l1model #lcpo