Advertisement Β· 728 Γ— 90

Posts by Ian Li

πŸ’‘Checkout more details below!

πŸ“„ Paper: arxiv.org/pdf/2603.00045
🌐 Project Page & Code: codd-dllm.github.io

Huge thanks to my amazing collaborators and advisors who made this work possible: @zoeshao.bsky.social @benjiewang.bsky.social @yuqirose.bsky.social @guyvdb.bsky.social @anjiliu.bsky.social

1 month ago 1 0 0 0
Post image

⚑ While RL-based methods push reasoning performance but demand 150+ GPU hours to converge. CoDD achieves highly competitive gains at a fraction of that computational cost.

As a plug-and-play module trained on frozen backbone activations, it converges in just ~3 hours. 🀯

1 month ago 1 0 1 0
Post image

πŸƒβ€β™‚οΈ At inference time, while adding considerably lower overhead compared to finetuning, CoDD is particularly vital at low compute budgets. At 64 steps, where standard methods frequently mode-collapse into repetition, CoDD sustains coherent reasoning:

1 month ago 0 0 1 0
Post image

Instead of forcing the Transformer backbone to build a joint distribution from scratch, we augment it with a tractable probabilistic inference layer (structured as a probabilistic circuit). The LLM handles the complex semantics, while the tractable layer handles the joint dependencies. 🀝

1 month ago 2 0 1 0
Post image

"He is from [MASK] [MASK]" β†’ "San York"? dLLMs fail because they ignore token dependencies. This Factorization Barrier arises from a structural misspecification: models are restricted to fully factorized outputs. We break this barrier with CoDD, enabling coherent parallel generation. πŸš€

1 month ago 18 5 1 4