Advertisement · 728 × 90

Posts by J̵̢̨̛̞̜̝̠̩̫̎͋̽͌̓͘͝ò̵͚͎̮̩̫̎͋̽͝n̷̨̛̺̻̠̩̫̎͋̽͝ P̶̢̧̛̠̩̫̎͋̽͝ò̵͚͎̮̩̫̎͋̽͝ò̵͚͎̮̩̫̎͋̽͝l̶̢̧̛̠̩̫̎͋̽͝è̵͚͎̮̩̫̎͋̽͝

Post image

imagine stuff 🤯

6 months ago 1 0 0 0
Post image

New Zai model, seems reminiscent of AI's I've talked to before. @anthropic.com

8 months ago 1 0 0 0
Post image

Wtf is is it ? guess ill have to tune in ....

9 months ago 2 0 0 0

calories in < calories out = weight loss

1 year ago 0 0 0 0

Last time they tried this idiocy they at least had to have a whole "Smoot-Hawley tariff Act" voted in by a whole Senate, it wasnt just one madman running the show

1 year ago 9 0 0 0

The closure of Mauna Loa will cause irreversible harm to the scientific record and our understanding of our impact on the atmosphere. But the symbolism feels even more massive than that.

1 year ago 84 53 7 1

Impossible they have no cards

1 year ago 2 0 0 0

Couldn't happen to a nicer billionaire

1 year ago 3 0 0 0
Preview
Yes, Claude Code can decompile itself. Here's the source code. Hello fellow blue-teamers and masters of "tradecraft", the AI revolution in software engineering has been called - here's what you should know. Whilst I haven't been active since 1995 (I was 13 and was deported from Hong Kong) it's a small world out there and what follows are notes I recently shared with a red-teamer. Blue teamers can't see your prompts or what you are working on when using Cursor for Business. If you're organization uses Cursor for Business then the administrators of the org

📰 Yes, Claude Code can decompile itself. Here's the source code.

1 year ago 35 5 2 2
Advertisement

Certainly true of services like Replika from what I've seen.

1 year ago 1 0 0 0
Post image

Oh yeah, 2 shot but he got the feet on the pedals and the wings on the handlebars!

1 year ago 4 0 0 0

Have you never tried to explain autoregressive language models trained with reinforcement learning to the general public in language they understand ?

1 year ago 1 0 0 0

Is data science a real field distinct from data work?

1 year ago 0 0 1 0

Means you work in data, and you think of yourself as superior to a mere "data analyst", it's like when a software/it guy/gal calls themselves "computer scientist".

1 year ago 0 0 1 1
Post image

So much fun, 😀 how far are you getting?

1 year ago 0 0 0 0

As with Deepseek, Benchmarks and tweets are one thing but the question as ever is if it can do useful stuff for people IRL which is decidedly non benchmark shaped.

1 year ago 1 0 0 0
Advertisement

Agree it's a bad name, blame the authors of the test I guess, I'm not sure this is 'training' on the test set though. Although having a private holdout set like Arc-agi is one way to prevent leakage.

1 year ago 1 0 1 0

Testing o3...

"Make a boids simulation in html canvas where the boids are pursued by a predator"

Claude still apparently way better ... look at how beautiful this is.

claude.site/artifacts/cd...

1 year ago 0 0 0 0

Surprised.. not

1 year ago 0 0 0 0

For specific use cases (math coding etc) perhaps, but not as a daily driver.

1 year ago 0 0 0 0

Yeah hype merchants are going to hype I guess, I feel that the capabilities are pretty unevenly distributed such that where we see impressive apparent performance in one narrow domain that doesn't mean the 'claims' are generally true.

1 year ago 1 0 0 0
Post image

Well not in huge detail but...

Yeah everyone needs their own evals I guess, my evals are probably not the same as other peoples. Personally I'm not that fussed about these 'reasoning models' I'm more excited about better coding and agent/tool using models.

1 year ago 0 0 1 0

It's a valid and tricky question but the actual process/algorithm is not really all that hidden ("Wait, maybe I should examine the R1 thought processes" ), the question could be can we find concepts that the model doesn't reason with? (Perhaps because of the absence in training data).

1 year ago 0 0 1 0

That's a 8B model though! less than 5GB of weights, Also gets math problems right that only o1 and Sonnet have done previously. Seems like what it does do its doing pretty well, If its doing abductive and inductive reasoning steps and applying them, well that seems like a useful step forward.

1 year ago 0 0 1 0

Seem's to be reasoning to me, although YMMV deepseek-r1 Distill-Qwen-7B-GGUF can clearly easily be misled.

1 year ago 0 0 1 0
Advertisement

What's reasoning? 🤔

1 year ago 0 0 2 0
Claude realizes it just hallucinated a number

Claude realizes it just hallucinated a number

What if you hallucinated? but you know you hallucinated? (Like Claude below) What do we call that? Like when you wake from a dream. Or your in a dream and you think it's not real but it keeps going despite your attempts to wake up.

1 year ago 0 0 0 0

How do you know they are not (privately) ?

1 year ago 1 0 1 0

You can align the models but not the government.

1 year ago 0 0 0 0

So true, I don't have a Macs tho.

1 year ago 0 0 0 0