Advertisement ยท 728 ร— 90

Posts by Jason Weston

Our new work on continuous chain of thought.

1 year ago 4 0 0 0
Post image

Analysis: AD picks high temp for creative & low for fact-seeking prompts, automatically via training.

Our methods AD & Latent Pref Optimization are general & can be applied to train other hyperparams or latent features.

Excited how people could *adapt* this research!
๐Ÿงต4/4

1 year ago 2 0 0 0
Post image

We train on a mix of tasks:
GSM8K - requires factuality (low temp)
Stories - requires creativity (high temp)
UltraFeedback - general instruction following, requires mix

Results: Adaptive Decoding outperforms any fixed temperature, automatically choosing via the AD layer.
๐Ÿงต3/4

1 year ago 2 0 2 0
Post image

Recipe ๐Ÿ‘ฉโ€๐Ÿณ:
Adaptive Decoder (AD) Layer:
- Assigns probability to each hyperparam choice (decoding temp) given hidden state. Given temp, sample a token.

Training (Latent PO):
- Train AD by sampling params+tokens & use reward model on rejected hyperparam preference pairs
๐Ÿงต2/4

1 year ago 1 0 1 0
Post image

๐Ÿšจ Adaptive Decoding via Latent Preference Optimization ๐Ÿšจ
- New layer for Transformer, selects decoding params automatically *per token*
- Learnt via new method Latent Preference Optimization
- Outperforms any fixed temperature decoding, choosing creativity or factuality
arxiv.org/abs/2411.09661
๐Ÿงต1/4

1 year ago 43 6 2 0