Advertisement · 728 × 90

Posts by Felix Draxler

Huge thanks to amazing collaborators: Justus, Farrin, Theo, Sameer and Stephan!

Meet us at #ICLR: Apr 23, morning poster P3-#608

16 hours ago 0 0 0 0
Preview
GitHub - mandt-lab/ptp: Parallel Token Prediction for Language Models (ICLR 2026) Parallel Token Prediction for Language Models (ICLR 2026) - mandt-lab/ptp

Parallelize the model you care about with our code:
github.com/mandt-lab/ptp

16 hours ago 1 0 1 0
Post image

We verify that this works in a speculative decoding experiment: We distill Vicuna-7B on conversations to predict the same output, but at greatly reduced latency. We achieve a 2.4x speedup over AR on diverse text tasks on one GPU, with 3.2x possible with optimized implementation.

16 hours ago 1 0 1 0
Post image Post image

Autoregressive models produce text by predicting the histogram of the next token. Random auxiliary variables determine the token to choose.

Parallel Token Prediction directly learns what external randomness maps to which token. This allows to predict many tokens at once.

16 hours ago 1 0 1 0
Video

LLMs are autoregressive and slow? No! Parallel Token Prediction decodes multiple consistent tokens in one model call. PTP allows arbitrary dependencies in one call, unlike discrete diffusion. Practical: 2.4x speedup

github.com/mandt-lab/ptp
ICLR: Apr 23, morning poster P3-#608

16 hours ago 24 5 1 2