Advertisement · 728 × 90

Posts by Ethan

Reminds me of the SolidGoldMagikarp

1 year ago 1 0 0 0
Preview
Differentiable Image Parameterizations A powerful, under-explored tool for neural network visualizations and art.

distill.pub/2018/differe...

1 year ago 1 0 0 0
Video

this is so cool

1 year ago 3 0 1 0
Post image

it's crazy to me that RoPE's issue with BF16 wasn't noticed earlier.
For a reasonable N of 2048, these are the computed frequencies prior to cos(x) & sin(x) for fp32 above and bf16 below.
Given how short the period is of simple trig functions, this difference is catastrophic for large values.

1 year ago 8 1 1 1


It’s an unstoppable force and all I can say is don’t hate the player (especially not the underdog) hate the game. Whether or not HF released this dataset your data is being used, you may as well also have access to its collection.

1 year ago 2 0 0 0

This is the big one and I can’t stress this enough. All of your data everywhere is being gathered and used anyway by private actors. The only fire you can fight back with is to play on that same field and democratize it. This anger is way mistargeted

1 year ago 6 0 1 0
Post image
1 year ago 4 0 1 0

Aren’t there datasets just like this for twitter and everything else imaginable? Idk why this is suddenly taboo, most making these datasets also aren’t sharing them publicly

1 year ago 9 0 3 0

🙄

1 year ago 0 0 0 0
Preview
GitHub - ethansmith2000/fsdp_optimizers: supporting pytorch FSDP for optimizers supporting pytorch FSDP for optimizers. Contribute to ethansmith2000/fsdp_optimizers development by creating an account on GitHub.

github.com/ethansmith20...

1 year ago 1 0 0 0
Advertisement
Post image

Just added FSDP2 support for MARS and Muon!

1 year ago 8 2 1 0

that's what they all say

1 year ago 0 0 0 0
Post image
1 year ago 3 0 1 0

Awesome list, thanks!

1 year ago 1 0 0 0
Post image

Excellent writeup on GPU streams / CUDA memory
dev-discuss.pytorch.org/t/fsdp-cudac...

TLDR by default mem is proper to a stream, to share it::
- `Tensor.record_stream` -> automatic, but can be suboptimal and nondeterministic
- `Stream.wait` -> manual, but precise control

1 year ago 29 1 2 0

Incredible to see what is likely SOTA results coming out of open source with full reproducibility!
Happy to have helped provide the compute for this and hoping to support more awesome research like this!

1 year ago 11 0 0 0

First, my sincerest thanks to @leonardoai.bsky.social with the help of
@ethansmith2000.com for generously providing H100s to support this research to enable this release. Y'all rock, thanks so much! <3

1 year ago 2 1 1 0

Absolutely sick!

1 year ago 2 0 0 0
Advertisement
Post image

New NanoGPT training speed record: 3.28 FineWeb val loss in 4.66 minutes

Previous record: 5.03 minutes
Changelog:
- FlexAttention blocksize warmup
- hyperparameter tweaks

1 year ago 33 3 2 1

i trying to follow as many of my old moots as possible and new people as i find them. some of y'all changing your pfp is just mean spirited (im lazy and learned people's pfps not names)

1 year ago 36 1 8 0

Greetings xjdr

1 year ago 4 0 0 0

Untuned SOAP beats tuned adamw at ever single step

1 year ago 6 1 0 0

Yes @hessianfree.bsky.social can speak more to this

1 year ago 2 0 0 0
Post image

ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.

1 year ago 9 1 1 1

There’s a void of PSGD hype that needs to be filled here

1 year ago 1 0 0 0
Advertisement

I goofed and never tested distributed saving, but now it works!
It was a little annoying as both SOAP and psgd maintain preconds as lists of varying size, which fail to be pickled. To fix this I hardcoded there to be a max of 4 (based on conv layers being 4d tensors).

1 year ago 6 0 0 0

I’ve generally preferred research to software engineering but I am growing a liking for building the tools used for research

1 year ago 12 0 1 0

Fancy seeing you here 👋

1 year ago 1 0 0 0

Self-proclaimed hessianfree guy going back on his word

1 year ago 0 0 1 0

Haven’t tested, but should be typical FSDP experience.

1 year ago 0 0 0 0