Advertisement · 728 × 90

Posts by pranav

Everything about that Luigi guy is just sad. Manifesto written by an LSTM, no big plan or idea, got caught in the dumbest way imaginable, miserable health condition.. just sad overall

1 year ago 9 0 1 0

what’s the mfu like

1 year ago 5 0 0 0

Personally I’m even more primitive and know basic calculus only. So the significance of this is totally lost on me. But at the same time I don’t want to do a depth first search and take 5 years to grok all this either

1 year ago 6 0 0 0
Post image

mom pick me up they’re putting air quotes on integrals now

1 year ago 8 0 1 0

Does exploration

1 year ago 1 0 0 0

Falsifiable prediction = respect

1 year ago 3 0 0 0

Similar to how “Threads should not be a library”

1 year ago 1 0 0 0

Commits should be first class objects. VCS is not some outer loop feature. It is what we do

1 year ago 3 0 1 0

That’s not even the first one. Just the first good one that didn’t use Hidden Markov Models

1 year ago 1 0 1 0
‪Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks‬ ‪A Graves, S Fernández, F Gomez, J Schmidhuber‬, ‪Proceedings of the 23rd international conference on Machine learning, 2006‬ - ‪Cited by 7,222‬

scholar.google.co.uk/citations?vi...

1 year ago 0 0 1 0
Advertisement

Ah that explains your knowledge of dosas finally

1 year ago 0 0 0 0

Good water supply

1 year ago 1 0 0 0

grateful for unga buga loss go down technology

🦃🇺🇸💸

1 year ago 4 0 0 0

Importance sampling does not work in deep learning

This suggests that SGD is a frequentist process, not a bayesian one

proceedings.mlr.press/v97/byrd19a/...

1 year ago 3 0 0 0
Post image

hmm what a coincidence this suddenly popped up on the other site

1 year ago 4 0 1 0

Everybody gangsta until deepseek r1 starts thinking in Chinese

1 year ago 8 0 0 0

There are papers pipelining along the token dimension.
Agree it’s a little too good to be true, too basic to be new

1 year ago 0 0 0 0

I read it twice and still don’t understand what the insight is. Might have to read the paper

1 year ago 0 0 0 0

Looks like Sutton didn’t get the memo. Memory and compute are cheap. It’s called the “Bitter Lesson”

Replay buffers are a-ok!

1 year ago 1 0 0 0

I now hit cmd + s every breath due to trauma from this

1 year ago 1 0 0 0
Advertisement

Eric Schmidt has written a second book with Henry Kissinger on AI. Incredible

1 year ago 4 0 1 0
Post image

distributed learning for LLM?

recently, @primeintellect.bsky.social have announced finishing their 10B distributed learning, trained across the world.

what is it exactly?

🧵

1 year ago 23 6 1 2

delete this

1 year ago 21 0 0 0

There’s also BPE dropout

1 year ago 2 0 1 0

btw training a 5e25 flops model at 50% MFU would take 10k H100s for 100 days. anything more than that is surplus territory.

in any case pretty impressive operation!

1 year ago 1 0 0 0

Wow never would have thought you’d be an options trader. Honestly respect 🫡

The future is proper uncertain now after a while so selling CCs might just be the move

1 year ago 1 0 1 0

Well if pre training is over NVDA is at risk. o1 inference, data selection etc can be done on AMD

1 year ago 0 0 1 0
Advertisement

Lucky man

1 year ago 0 0 1 0
Preview
OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI Three of the leading artificial intelligence companies are seeing diminishing returns from their costly efforts to develop newer models.

Some recent reports such as this one and all typical thought leaders and podcasters jumping on it is creating a narrative

www.bloomberg.com/news/article...

1 year ago 1 0 1 0

Agree but I’ve never seen positive transfer happen so I’m a bit pessimistic.

Not sure earliness of fusion has been a bottleneck here, do you have reasons for why it could be?

It’d be great if we could figure out a pre training objective for YouTube that transfers to text

1 year ago 2 0 1 0