There's a chance they IPO to high valuations this year and then those valuations fall a bit as adoption slows and revenue growth stabilizes. Wouldn't really call that a bubble though.
Posts by Sam Harsimony
Recently been feeling less concerned about a bubble (at least for next ~year). Companies are slowing their compute spend a bit and focusing more on bottom line.
Still think they won't be able to charge crazy profit margins (which is good for us).
bsky.app/profile/hars...
It sure would be nice to have a theorem proving that this is the case in general with NN's.
Like seems like far OOD behavior should be random right?
By and large, when AI's fail they fail in quirky ways. Not catastrophic, just weird/funny.
Very good that the current training paradigm does silly things when exposed to OOD inputs rather than becoming a paperclip maximizer.
AES used Maximo robots to install 100 MW at Bellefield, with crews fitting ~24 photovoltaic modules per person per hour; Civ Robotics' CivDot marks ~3,000 layout points daily with ~8 mm accuracy and 100+ units are in the field.
I've been waiting to update this thread once DeepSeek V4 comes out. Neglected to include minimax but will add!
bsky.app/profile/hars...
Both of course! Though I'm optimistic that in the long term the downsides can be addressed.
Promising initial results on an mRNA vaccine for pancreatic cancer!
www.nbcnews.com/health/cance...
vaccines remain a highly underrated technology
bsky.app/profile/hars...
I have a theory that AI companies will realize that brute scaling isn't as profitable and will pivot to specialization.
bsky.app/profile/hars...
Yeah my assumption is that the gap will stay relatively constant. And it's larger than what the benchmarks would suggest, maybe like 6-9 months?
But consequences are the same, lots of competition in AI inference will make it cheap.
A TLDR is that unless the training dynamics of leading LLMs change or open model builders run out of money, this ~6 month performance gap from closed to open models is here to stay.
www.interconnects.ai/p/reading-to...
Interesting. I don't really get why they're trying to push profits up and angering users at this point in time. Seems better to build up a lot of goodwill going into their IPO?
On artificial analysis, the input cost increased 77% from 4.6 -> 4.7 BUT the reasoning was more efficient, so lower cost overall for 4.7.
Curious how this is going to impact prices programmers are paying on net.
If you change the tokenizer to use 46% more input tokens is that not just a sneaky way to implement a 46% price increase?
bsky.app/profile/simo...
Last year, we introduced FlexOlmo, a novel way to train parts of a model independently then combine them later.
BAR builds on that idea for a harder problem: how to keep improving a model without having to retrain each time. 🧵
I'm impressed with Kimi because they've increased capabilities *without* increasing memory footprint.
None of the other Chinese labs can say the same. And I suspect that Google, Anthropic, and OAI are also pushing up model sizes (and operating costs).
"A vaccine during pregnancy which protects newborns against nasty chest infections is cutting hospital admissions of babies by more than 80%, UK health officials say."
www.bbc.com/news/article...
It's great news, for those in ~40 countries that aren't NZ.
In NZ, it is not Medsafe-authorised.
Kimi 2.6 is now available on @hf.co 🔥🎉
huggingface.co/moonshotai/K...
✨ 1T MoE / 32B active / 256K context
✨ Agent Swarm: 300 sub-agents × 4,000 steps
✨ Modified MIT
Yup, this is just rent control in a different format.
Perhaps we should replace the word profit with "making things that people really want and that nobody else is making"
Folks, infrasound issues are fake. This was truly an insane experience to write and I hope you enjoy blog.andymasley.com/p/contra-ben...
- Large Sample Covariance Matrices and High-Dimensional Data Analysis, Jianfeng Yao, Shurong Zheng, Zhidong Bai
- Quantum Computing Since Democritus, Scott Aaronson
- All of Statistics, Larry Wasserman
From a skim of the paper seems like there are multiple attention heads but it's not an MoE architecture.
Some neat work on stabilizing looped transformer models.
Looped (or universal) transformers are interesting because they shrink the memory footprint and thus lower inference costs.
sandyresearch.github.io/parcae/
I wouldn't be all that surprised to see more of this going into the future. If transformers have some cap on capabilities, I would expect to see models that specialize in one area get worse in others by necessity. If it takes huge huge huge transformers to get AGI, anything less must be specialized
bsky.app/profile/did:...
New post! I think aligned data, safeguards, defensive technologies, and law will lower AI risks enough that we can move forward with its development.
splittinginfinity.substack.com/p/training-o...
I don't actually get why people don't block tankies on sight and feel like they need to refute them. We ought to be treating them as similarly revolting to the gooner pedo frog Nazis we block on sight on Twitter.