Advertisement · 728 × 90

Posts by Abhimanyu Hans

Let’s sanity check DeepSeek’s claim to train on 2048 GPUs for under 2 months, for a cost of $5.6M. It sort of checks out and sort of doesn't.

The v3 model is an MoE with 37B (out of 671B) active parameters. Let's compare to the cost of a 34B dense model. 🧵

1 year ago 10 2 1 0

Absolutely!

1 year ago 1 0 0 0
Post image

poster sent for print 😮‍💨

are you concerned your LLM might regurgitate exact training data to your users?

join me and my co-authors at #NeurIPS2024 on wed's 1st poster session & learn how goldfish loss can help you.

eager to meet friends from past and future!

p.s. hmu if you hiring summer intern!

1 year ago 6 0 1 0