Everything about that Luigi guy is just sad. Manifesto written by an LSTM, no big plan or idea, got caught in the dumbest way imaginable, miserable health condition.. just sad overall
Posts by pranav
what’s the mfu like
Personally I’m even more primitive and know basic calculus only. So the significance of this is totally lost on me. But at the same time I don’t want to do a depth first search and take 5 years to grok all this either
mom pick me up they’re putting air quotes on integrals now
Does exploration
Falsifiable prediction = respect
Similar to how “Threads should not be a library”
Commits should be first class objects. VCS is not some outer loop feature. It is what we do
That’s not even the first one. Just the first good one that didn’t use Hidden Markov Models
Ah that explains your knowledge of dosas finally
Good water supply
grateful for unga buga loss go down technology
🦃🇺🇸💸
Importance sampling does not work in deep learning
This suggests that SGD is a frequentist process, not a bayesian one
proceedings.mlr.press/v97/byrd19a/...
hmm what a coincidence this suddenly popped up on the other site
Everybody gangsta until deepseek r1 starts thinking in Chinese
There are papers pipelining along the token dimension.
Agree it’s a little too good to be true, too basic to be new
I read it twice and still don’t understand what the insight is. Might have to read the paper
Looks like Sutton didn’t get the memo. Memory and compute are cheap. It’s called the “Bitter Lesson”
Replay buffers are a-ok!
I now hit cmd + s every breath due to trauma from this
Eric Schmidt has written a second book with Henry Kissinger on AI. Incredible
distributed learning for LLM?
recently, @primeintellect.bsky.social have announced finishing their 10B distributed learning, trained across the world.
what is it exactly?
🧵
delete this
There’s also BPE dropout
btw training a 5e25 flops model at 50% MFU would take 10k H100s for 100 days. anything more than that is surplus territory.
in any case pretty impressive operation!
Wow never would have thought you’d be an options trader. Honestly respect 🫡
The future is proper uncertain now after a while so selling CCs might just be the move
Well if pre training is over NVDA is at risk. o1 inference, data selection etc can be done on AMD
Lucky man
Some recent reports such as this one and all typical thought leaders and podcasters jumping on it is creating a narrative
www.bloomberg.com/news/article...
Agree but I’ve never seen positive transfer happen so I’m a bit pessimistic.
Not sure earliness of fusion has been a bottleneck here, do you have reasons for why it could be?
It’d be great if we could figure out a pre training objective for YouTube that transfers to text