π‘Checkout more details below!
π Paper: arxiv.org/pdf/2603.00045
π Project Page & Code: codd-dllm.github.io
Huge thanks to my amazing collaborators and advisors who made this work possible: @zoeshao.bsky.social @benjiewang.bsky.social @yuqirose.bsky.social @guyvdb.bsky.social @anjiliu.bsky.social
Posts by Ian Li
β‘ While RL-based methods push reasoning performance but demand 150+ GPU hours to converge. CoDD achieves highly competitive gains at a fraction of that computational cost.
As a plug-and-play module trained on frozen backbone activations, it converges in just ~3 hours. π€―
πββοΈ At inference time, while adding considerably lower overhead compared to finetuning, CoDD is particularly vital at low compute budgets. At 64 steps, where standard methods frequently mode-collapse into repetition, CoDD sustains coherent reasoning:
Instead of forcing the Transformer backbone to build a joint distribution from scratch, we augment it with a tractable probabilistic inference layer (structured as a probabilistic circuit). The LLM handles the complex semantics, while the tractable layer handles the joint dependencies. π€
"He is from [MASK] [MASK]" β "San York"? dLLMs fail because they ignore token dependencies. This Factorization Barrier arises from a structural misspecification: models are restricted to fully factorized outputs. We break this barrier with CoDD, enabling coherent parallel generation. π