(@coolstuffdude) Bsky

also a claude distillation benchmark lol

1 month ago 1 0 0 0

i'm glad someone went ahead and built it! great work bsky.app/profile/cool...

1 month ago 1 0 0 0

aha! i had to come back to this thread too @faineg.bsky.social you have anything to do with this 🤔

1 month ago 2 0 0 0

i wanted to disagree with this but i do think joe rogan caused more societal harm than all ai companions have

i think the answer is sadly that ai companions tend to make people insane and outcasts vs influencers make people feel like part of a group

we need to keep making them seem cringe

2 months ago 4 0 0 0

which one is more of a girl game, dragon age origins or baldurs gate 3

2 months ago 2 0 1 0

agreed! even with classifier llms there are so many optimizations when doing things like this

with
- almost no latency requirements
- tiny, fixed max input lengths
- tiny 1-5 token output lengths
- really small models

you can batch super heavily and get a lot of work done on private hardware

2 months ago 0 0 0 0

It's like the bell curve meme

agents are just a while loop
agents are flow charts and prompts and rag and personas and planners and scientists
agents are just a while loop

i think it is a fun journey to ride this bell curve and really get a feel for why

2 months ago 4 0 1 0

Seeing these types of things makes me appreciate that I've been following along with the growth of AI and learning and trying things as soon as they're released. It took a long time!

I can't imagine the pressure people feel seeing these videos, having AI use quotas they have to meet

2 months ago 2 0 0 0

I had the same idea and looked into it a bit too, luckily meta only has one physical device, so maybe you can check for the meta assigned mac range

if its streaming its probably dumping tons of data so it should be visible. if its recording they only have a minute before it has to send to device

2 months ago 3 0 1 1

especially if this ties into actual validated, manual moderation afterwards, this sounds like a really great way to get a dataset to SFT a model

pangram themselves probably has something really similar set up for their own training

2 months ago 2 0 0 0

yep exactly, i probably wouldn't explicitly prompt the model that it's meant to pick it out for ai detection, rather that it should pick out the most important paragraph or something like that

the closer you can make the task sound to how you think google trains it is the key

2 months ago 2 0 1 0

looking quickly at the pangram api pricing, it seems like 10-50x more expensive per token than a model like gemini 3 flash

so it would probably make sense to use a cheap solid model to pull out a single paragraph for pangram to analyze, or maybe a few, would take some fiddling

2 months ago 1 0 1 0

ya fully agreed! if you keep the same interface for both options it would make it nice to compare and make sure a better solution is actually doing better

2 months ago 1 0 1 0

If this was easy, exa and other companies would provide this out of the box.

if you can get decent results with gemini or anthropic, then it may be worth digging into more

2 months ago 1 0 1 0

The quality of your results are heavily dependent on their web crawler, if they're able to filter out ads, banners, other random text, etc.

You can estimate prices pretty well if you assume a fixed max input length and use a model without reasoning + using structured output to get a yes/no resp

2 months ago 1 0 1 0

chiming in here randomly because this is interesting, it is probably best to prototype this with one of the big ai platforms which provide models + web fetch together for simplicity.

2 months ago 1 0 1 0

funny how this is now true for graphics as well as llms

2 months ago 0 0 0 0

He did a pretty good song with ozzy at one point

2 months ago 1 0 0 0

But this made me think, what would a good example of a reddit hive mind type thing be?

Moltbot: Except each agent gets 10k tokens of the epstein files it is responsible for and is tasked to work together to compile a list of names of the criminals

First thing it needs to do is post a summary

2 months ago 1 0 0 0

I do think it would be a cool experiment to RL train a model to make the posts that get the most upvotes from other models. You are then simulating a content creator lol

Is that the "purpose" of social media :think:

2 months ago 1 0 1 0

When I think about these questions, I think "how would I train this?"

You can generate examples of the model reading 10 posts and commenting on one, but there is no "purpose" there.

I can imagine doing RL for it, like if an agent posts on stack overflow if it can't figure something out

2 months ago 2 0 1 0

Is this voiced by Matt Mercer???

2 months ago 8 0 0 0

An AI Toy Exposed 50,000 Logs of Its Chats With Kids to Anyone With a Gmail Account AI chat toy company Bondu left its web console almost entirely unprotected. Researchers who accessed it found nearly all the conversations children had had with the company's stuffed animals.

The AI-chat-enabled stuffed toy Bondu invites little kids to have intimate conversations with it, like an LLM imaginary friend. It also exposed virtually all their chats on a web interface with no security. Anyone with a Gmail account could log in and read transcripts. www.wired.com/story/an-ai-...

2 months ago 273 162 14 32

We see these things, but we've also been spending a year thinking about agents, and now we talk to them all day every day in things like claude code. Plus as engineers we have some vague understanding of how they work.

There's no difference between agent and wizard to 99.99% of people in the world

2 months ago 4 0 0 0

One thing that is interesting to try is to see if the model assumes the container is "codex like", as in it has things built in that come from codex. The most common thing is for the model to use apply_patch as a cli command to modify files, which comes from codex. Codex gives lots of neat hints!

2 months ago 0 0 0 0

thank you for trying! i wonder if you have to set it when enabling the tool on the api platform.openai.com/docs/guides/...

Really does feel like Anthropic's containers are better, especially with how long you can keep them alive

2 months ago 0 0 1 0

Have you tried having two api requests referencing the same container work at the same time?

Curious if you can force bugs or errors to happen when two processes modify the same file concurrently

2 months ago 0 0 1 0

the difference between making a product based on prompting vs RL training a model to complete objectives and doing it well

this is why RL is such a big deal!

2 months ago 0 0 0 0

thank you!

2 months ago 2 0 0 0

yes please! i'm interested :)

2 months ago 3 0 0 0

Posts by