Roy Frostig (@froystig) Bsky

And for diffusion specifically, there are several reference models implemented in github.com/AI-Hypercomp..., in case you hadn't come across it already and it suits some of your needs.

1 year ago 0 0 1 0

are useful beyond that. We have good input here already, but DM or email me if you'd ever like to talk with the team some more. Either way we appreciate it!

1 year ago 0 0 1 0

Indeed it's great to hear from you! And thanks for all of this detail. I've shared it with team members who've started working on models. You'll see more on the transformers side at first since that's already underway (and e.g. relates to the book upthread) but your points on diffusion and GNNs ...

1 year ago 0 0 1 0

@nmboffi.bsky.social – We have some plans to improve that this year. As examples, do you have any models in particular that you'd really like to see? Does training, tuning, inference, or anything else matter most to you? What hardware?

1 year ago 4 0 1 0

Training our most capable Gemini models relies heavily on our JAX software stack+Google's TPU hardware platforms.

If you want to learn more, see this awesome book "How to Scale Your Model":

jax-ml.github.io/scaling-book/

Put together by several of my Google DeepMind colleagues listed below 🎉.

1 year ago 76 13 2 1

Our online book on systems principles of LLM scaling is live at jax-ml.github.io/scaling-book/

We hope that it helps you make the most of your computing resources. Enjoy!

1 year ago 35 9 3 1

Posts by Roy Frostig