Advertisement · 728 × 90

Posts by Alessandro Sordoni

Post image

Today is the International Day for the Elimination of Violence against Women. According to the UN, more than 50 000 women were killed by a partner or family member in 2023 news.un.org/en/story/202... This number is an underestimate given that only 37 countries reported in 2023.

1 year ago 8 3 0 0

Yeah I suspected that, interesting!

1 year ago 0 0 0 0
Post image

distributed learning for LLM?

recently, @primeintellect.bsky.social have announced finishing their 10B distributed learning, trained across the world.

what is it exactly?

🧵

1 year ago 23 6 1 2

Instead of averaging outer gradients would fancier model merging techniques (eg TIES) apply here?

1 year ago 1 0 1 0

having better tools for reviewer and ac assignment would definitely help, ultimately reducing # reviews per paper while striving for relevance and quality could increase reviewer / ac engagement and free up their time to do better reviews

1 year ago 2 0 0 0
Informatics: ILCC: Language Processing, Speech Technology, Information Retrieval, Cognition Study Informatics: ILCC: Language Processing, Speech Technology, Information Retrieval, Cognition at the University of Edinburgh. Our postgraduate degree programmes focus on natural language processin...

Last 5 days to apply for a PhD at #EdinburghNLP!

Deadline: November 25

www.ed.ac.uk/studying/pos...

If you are passionate about:

- adaptive tokenization and memory in foundation models
- modular deep learning
- computational typology

please message me or meet me at #NeurIPS2024!

1 year ago 20 8 0 0
A sparse mask of attention scores based on VerticalAndSlashAttention and a plot of loss vs sparsity ratio for various methods.

A sparse mask of attention scores based on VerticalAndSlashAttention and a plot of loss vs sparsity ratio for various methods.

Another nano gem from my amazing student
Piotr Nawrot!

A repo & notebook on sparse attention for efficient LLM inference: github.com/PiotrNawrot/...

This will also feature in my #NeurIPS 2024 tutorial "Dynamic Sparsity in ML" with André Martins: dynamic-sparsity.github.io Stay tuned!

1 year ago 42 8 2 3

Explore zero-shot routing of parameter-efficient experts with Phatgoose arxiv.org/abs/2402.05859 and Arrow arxiv.org/abs/2405.11157 w. github.com/microsoft/mttl

👉 github.com/sordonia/pg_mb…

Part of "Dynamic Sparsity in ML" tuto #neurips2024, feedback welcome and join for discussions! 😊

1 year ago 5 1 0 0