Seems to this lowly layman that if you want to distance yourself from "human exceptionalism," you need to test the models through means other than these disgusting homosapien benchmarks.
Posts by Vincent Carchidi
I understand the tendency to think people critical of GenAI are given to some kind of "human exceptionalism," but the only basis for claiming GenAI has sophisticated intellectual/cognitive/what-have-you abilities is its performance on characteristically human tasks.
(this day is never happy)
Happy limited DoD budget release day to all who celebrate
I cannot believe that you think my claim about the end of history and the impending obsolescence of the human race is controversial. Is your head that deep in the sand?
Really sorry about this one
Uptime? Nothing dawg what's up with you?
Look, when YOU get replaced by AI, it's a skill issue.
When *I* get replaced by AI, it's a humbling reminder of the pace of technology development.
I'm not savvy enough to figure out how to phrase this properly, but I do think the engineering vs. explanation mindset divide is a big part of this. Sort of a difficulty with not viewing every LLM performance in functionalist terms...
Oh yeah, for sure. Which I don't think is necessarily a bad thing, but leads to the kinds of "total factor productivity cope" lines lol.
Do you think this time could be different? Or that the gains (whatever they might be) just don't show up in TFP?
I think I see what you're getting at, but what do you mean by "first" tech revolution?
This is how I feel about the famous 4chan-esque story GPT-3.5 wrote about a guy digging a bottomless pit.
I'm referring to his recent comments on job displacement, which he's been making for a while now. And also his doomerism. I think him and Bengio just really enjoy telling people the end is nigh.
You thought this was gonna be a joke. Shame.
I view him kind of like Dawkins. Great scientist, great in his area, but anything outside of his area and is self-indulgent/likes the sound of his own voice.
LLMs keep needing more and more and more, which turns out to be extremely effective for certain things. But the data-reliance means they aren't taking these extra steps.
I think it's very different, in a few ways, but the most relevant here I'd say is humans can find principled ways of generating ideas, building on them, etc. Not in the sense that they don't generate ideas in a scattershot way, but that they don't remain within the original 'knowledge enclosure.'
On the most basic level possible, (and I'm not talking about coding and verification here), I view LLMs' responses to open-ended problems in this way. Sometimes they, too, cook - and training has refined their ability to cook - but we do the selection (and that seems to matter).
H/t to @desiderratum.bsky.social for helping me put this together.
Sometimes @horsedisc.bsky.social is just cookin. We've all seen it. But we know horse does not know that it's cooking, and has no principled means of cooking, because half the time it's nonsense. We basically select from responses.
So true, as always.
...less well*
This is great. It's like a well less adjusted @horsedisc.bsky.social.
Get two free bets on Kalshi if you meet your nicotine goals this week!
Most, if not all, of this by the way is consistent what pretty much normal understandings of deep neural networks before LLMs made everyone go insane. I've said this before (and many others), but the scalability of transformers did not solve the problems in creating a "general" intelligence.
I think the ability to interpolate is remarkable at scale. But we were all waiting on this to tip into robust abstraction and generalization beyond training datasets, and instead what happened was they expanded the training datasets, refined them, and scaffolded the models effectively.
I guess the implicit thought here - on the flip side - is that I've never (or so rarely as to be flukes) personally seen a qualitatively/conceptually interesting work produced by an LLM, nor new and interesting directions for work, etc. The progress just hasn't yielded this.
(In a closed domain, the challenge of inferring intent is artificially restricted. I think there's a sophisticated ability, but also unfathomable amounts of human generated and synthetic data plus a scaffolding that constrains outputs.)
The trick with Claude Code-esque systems is to take the expressiveness of a frontier base model, let it shotgun outputs in response to a prompt, but run the bullets through verifiers before returning an answer to the user, which only works so smoothly in closed domains (crude, but you get my point).