Advertisement ยท 728 ร— 90

Posts by Giwon Hong

MMLU-Redux Poster at NAACL 2025

MMLU-Redux Poster at NAACL 2025

MMLU-Redux just touched down at #NAACL2025! ๐ŸŽ‰
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope ๐Ÿ˜…
If anyone's swinging by, give our research some love! Hit me up if you check it out! ๐Ÿ‘‹

11 months ago 17 11 0 0
Image illustrating that ALM can enable Ensembling, Transfer to Bytes, and general Cross-Tokenizer Distillation.

Image illustrating that ALM can enable Ensembling, Transfer to Bytes, and general Cross-Tokenizer Distillation.

We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*!

With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch more๐Ÿงต

1 year ago 25 14 1 0
Preview
Generative AI Laboratory

Joining the Generative AI Lab (GAIL, gail.ed.ac.uk) at the University of Edinburgh as a GAIL Fellow! Excited for what's ahead ๐Ÿค—

1 year ago 19 2 0 0
Preview
Mixtures of In-Context Learners In-context learning (ICL) adapts LLMs by providing demonstrations without fine-tuning the model parameters; however, it does not differentiate between demonstrations and quadratically increases the co...

Work done with: @pminervini.bsky.social @edoardo-ponti.bsky.social @emilevankrieken.com and Nikolay Malkin

Paper: arxiv.org/abs/2411.02830
(๐Ÿงต8/n)

1 year ago 3 0 0 0

๐Ÿ” Conclusion: ๐— ๐—ผ๐—œ๐—–๐—Ÿ offers a robust, efficient approach for combining demonstrations (experts), significantly boosting accuracy over baselines. ๐— ๐—ผ๐—œ๐—–๐—Ÿ is also resilient to low-quality demonstrations and achieves improved data and computational efficiency. (๐Ÿงต7/n)

1 year ago 2 0 1 0
Post image

โš™๏ธ Data and Compute Efficiency of ๐— ๐—ผ๐—œ๐—–๐—Ÿ: We find that ๐— ๐—ผ๐—œ๐—–๐—Ÿ is more efficient in terms of data and computation compared to conventional (concat-based) ICL! (๐Ÿงต6/n)

1 year ago 2 0 1 0
Post image

๐Ÿ“‰ Noisy and Imbalanced Demonstrations: By assigning weights to each demonstration subset, ๐— ๐—ผ๐—œ๐—–๐—Ÿ can effectively handle various practical applications where data quality varies. (๐Ÿงต5/n)

1 year ago 2 0 1 0
Advertisement
Post image

๐ŸŒGeneralization to Unseen Demonstrations: ๐™จ๐™˜๐™–๐™ก๐™–๐™ง weights require predefined demonstration subsets.
Using ๐™ƒ๐™ฎ๐™ฅ๐™š๐™ง-๐™ฃ๐™š๐™ฉ๐™ฌ๐™ค๐™ง๐™ โ€”a smaller fine-tuned hyper-network that dynamically generates weights for each expert based on all concatenated demonstration subsets. (๐Ÿงต4/n)

1 year ago 2 0 1 0
Post image

๐Ÿ“Š ๐— ๐—ผ๐—œ๐—–๐—Ÿ in Classification Tasks: ๐— ๐—ผ๐—œ๐—–๐—Ÿ outperformed Baseline ICL on 5 out of 7 datasets!
Using ๐™จ๐™˜๐™–๐™ก๐™–๐™ง weightsโ€”a vector of trainable parameters that assign each expert a weightโ€”we fine-tuned how demonstration subsets are combined. (๐Ÿงต3/n)

1 year ago 2 0 1 0
Post image

๐Ÿš€ How does ๐— ๐—ผ๐—œ๐—–๐—Ÿ improve In-Context Learning? ๐— ๐—ผ๐—œ๐—–๐—Ÿ prompts an LLM with multiple demonstration subsets, obtaining multiple experts, and merges their predictions via a trainable weighting functionโ€”it doesnโ€™t require any fine-tuning of the LLM parameters! (๐Ÿงต2/n)

1 year ago 3 0 1 0
Post image

๐Ÿค”How to achieve efficient ICL without storing a huge dataset in one prompt?
๐Ÿ’กMixtures of In-Context Learners (๐— ๐—ผ๐—œ๐—–๐—Ÿ): we treat LLMs prompted with subsets of demonstrations as experts and learn a weighting function to optimise the distribution over the continuation (๐Ÿงต1/n)

1 year ago 33 4 1 2
Post image

Iโ€™ll be travelling to London from Wednesday to Friday for an upcoming event and would be very happy to meet up! ๐Ÿš€
I'd love to chat about my recent works (DeCoRe, MMLU-Redux, etc.). DM me if youโ€™re around! ๐Ÿ‘‹

DeCoRe: arxiv.org/abs/2410.18860
MMLU-Redux: arxiv.org/abs/2406.04127

1 year ago 11 7 0 0

I would love to be added as well!

1 year ago 3 0 0 0