imagine stuff 🤯
Posts by J̵̢̨̛̞̜̝̠̩̫̎͋̽͌̓͘͝ò̵͚͎̮̩̫̎͋̽͝n̷̨̛̺̻̠̩̫̎͋̽͝ P̶̢̧̛̠̩̫̎͋̽͝ò̵͚͎̮̩̫̎͋̽͝ò̵͚͎̮̩̫̎͋̽͝l̶̢̧̛̠̩̫̎͋̽͝è̵͚͎̮̩̫̎͋̽͝
New Zai model, seems reminiscent of AI's I've talked to before. @anthropic.com
Wtf is is it ? guess ill have to tune in ....
calories in < calories out = weight loss
Last time they tried this idiocy they at least had to have a whole "Smoot-Hawley tariff Act" voted in by a whole Senate, it wasnt just one madman running the show
The closure of Mauna Loa will cause irreversible harm to the scientific record and our understanding of our impact on the atmosphere. But the symbolism feels even more massive than that.
Impossible they have no cards
Couldn't happen to a nicer billionaire
Certainly true of services like Replika from what I've seen.
Oh yeah, 2 shot but he got the feet on the pedals and the wings on the handlebars!
Have you never tried to explain autoregressive language models trained with reinforcement learning to the general public in language they understand ?
Is data science a real field distinct from data work?
Means you work in data, and you think of yourself as superior to a mere "data analyst", it's like when a software/it guy/gal calls themselves "computer scientist".
So much fun, 😀 how far are you getting?
As with Deepseek, Benchmarks and tweets are one thing but the question as ever is if it can do useful stuff for people IRL which is decidedly non benchmark shaped.
Agree it's a bad name, blame the authors of the test I guess, I'm not sure this is 'training' on the test set though. Although having a private holdout set like Arc-agi is one way to prevent leakage.
Testing o3...
"Make a boids simulation in html canvas where the boids are pursued by a predator"
Claude still apparently way better ... look at how beautiful this is.
claude.site/artifacts/cd...
Surprised.. not
For specific use cases (math coding etc) perhaps, but not as a daily driver.
Yeah hype merchants are going to hype I guess, I feel that the capabilities are pretty unevenly distributed such that where we see impressive apparent performance in one narrow domain that doesn't mean the 'claims' are generally true.
Well not in huge detail but...
Yeah everyone needs their own evals I guess, my evals are probably not the same as other peoples. Personally I'm not that fussed about these 'reasoning models' I'm more excited about better coding and agent/tool using models.
It's a valid and tricky question but the actual process/algorithm is not really all that hidden ("Wait, maybe I should examine the R1 thought processes" ), the question could be can we find concepts that the model doesn't reason with? (Perhaps because of the absence in training data).
That's a 8B model though! less than 5GB of weights, Also gets math problems right that only o1 and Sonnet have done previously. Seems like what it does do its doing pretty well, If its doing abductive and inductive reasoning steps and applying them, well that seems like a useful step forward.
Seem's to be reasoning to me, although YMMV deepseek-r1 Distill-Qwen-7B-GGUF can clearly easily be misled.
What's reasoning? 🤔
Claude realizes it just hallucinated a number
What if you hallucinated? but you know you hallucinated? (Like Claude below) What do we call that? Like when you wake from a dream. Or your in a dream and you think it's not real but it keeps going despite your attempts to wake up.
How do you know they are not (privately) ?
You can align the models but not the government.
So true, I don't have a Macs tho.