Today is the International Day for the Elimination of Violence against Women. According to the UN, more than 50 000 women were killed by a partner or family member in 2023 news.un.org/en/story/202... This number is an underestimate given that only 37 countries reported in 2023.
Posts by Alessandro Sordoni
Yeah I suspected that, interesting!
distributed learning for LLM?
recently, @primeintellect.bsky.social have announced finishing their 10B distributed learning, trained across the world.
what is it exactly?
🧵
Instead of averaging outer gradients would fancier model merging techniques (eg TIES) apply here?
having better tools for reviewer and ac assignment would definitely help, ultimately reducing # reviews per paper while striving for relevance and quality could increase reviewer / ac engagement and free up their time to do better reviews
Last 5 days to apply for a PhD at #EdinburghNLP!
Deadline: November 25
www.ed.ac.uk/studying/pos...
If you are passionate about:
- adaptive tokenization and memory in foundation models
- modular deep learning
- computational typology
please message me or meet me at #NeurIPS2024!
A sparse mask of attention scores based on VerticalAndSlashAttention and a plot of loss vs sparsity ratio for various methods.
Another nano gem from my amazing student
Piotr Nawrot!
A repo & notebook on sparse attention for efficient LLM inference: github.com/PiotrNawrot/...
This will also feature in my #NeurIPS 2024 tutorial "Dynamic Sparsity in ML" with André Martins: dynamic-sparsity.github.io Stay tuned!
Explore zero-shot routing of parameter-efficient experts with Phatgoose arxiv.org/abs/2402.05859 and Arrow arxiv.org/abs/2405.11157 w. github.com/microsoft/mttl
👉 github.com/sordonia/pg_mb…
Part of "Dynamic Sparsity in ML" tuto #neurips2024, feedback welcome and join for discussions! 😊