From a learning theory perspective, how do self-play (or other synth data) like AlphaZero achieve superhuman perf without “training data”?
Treat yourself to this really fun attempt to tackle this+other foundational Qs by reframing information in terms of a computationally bounded observer.
Posts by David Andrzejewski
I wrote a very long post on what a machine learning artifact is, how files work, safetensors, pickle, and a lot more, as a way to understand the new GGUF local LLM file format.
vickiboykis.com/2024/02/28/g...
Oops working link arxiv.org/abs/2310.15916
"In-Context Learning Creates Task Vectors" does some nice state substitution/patching experiments to tease apart function construction/selection vs function application in ICL - fun stuff huggingface.co/papers/2310....
A man with grey hair and a black suit looks calmly at a futuristic robot with intricate designs and glowing red eyes. The robot seems to be whispering something in the man's ear. On the right side of the image, there's a speech bubble from a notification prompting to continue watching "Seinfeld."
Frequent SF (Nob Hill) sighting recently: astonished & delighted tourists taking photos/videos of the self-driving cars.