📢Thrilled to introduce ATLAS 🗺️: the largest multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer:
🌍 Is scaling diff by lang?
🧙♂️ Can we model the curse of multilinguality?
⚖️ Pretrain vs finetune from checkpoint?
🔀 X-lingual transfer scores across langs?
1/🧵
Posts by Sneha Kudugunta @NeurIPS2024
5 months ago
18
1
1
1
Matformer introduces nested structure into the Transformer's FFN block & jointly trains all the submodels, enabling free extraction of hundred of accurate submodels for elastic inference
I will be at poster #2507 w/ my co-authors in East Exhibit Hall A-C at #NeurIPS2024 chatting about MatFormer and elastic models today at 4.30pm!
Come by, or reach out if you want to chat about pretraining, scaling laws or conditional computation!
arxiv.org/abs/2310.07707
1 year ago
8
0
0
0
Would love to be added!
1 year ago
3
0
1
0