also a claude distillation benchmark lol
Posts by
i'm glad someone went ahead and built it! great work bsky.app/profile/cool...
aha! i had to come back to this thread too @faineg.bsky.social you have anything to do with this ๐ค
i wanted to disagree with this but i do think joe rogan caused more societal harm than all ai companions have
i think the answer is sadly that ai companions tend to make people insane and outcasts vs influencers make people feel like part of a group
we need to keep making them seem cringe
which one is more of a girl game, dragon age origins or baldurs gate 3
agreed! even with classifier llms there are so many optimizations when doing things like this
with
- almost no latency requirements
- tiny, fixed max input lengths
- tiny 1-5 token output lengths
- really small models
you can batch super heavily and get a lot of work done on private hardware
It's like the bell curve meme
agents are just a while loop
agents are flow charts and prompts and rag and personas and planners and scientists
agents are just a while loop
i think it is a fun journey to ride this bell curve and really get a feel for why
Seeing these types of things makes me appreciate that I've been following along with the growth of AI and learning and trying things as soon as they're released. It took a long time!
I can't imagine the pressure people feel seeing these videos, having AI use quotas they have to meet
I had the same idea and looked into it a bit too, luckily meta only has one physical device, so maybe you can check for the meta assigned mac range
if its streaming its probably dumping tons of data so it should be visible. if its recording they only have a minute before it has to send to device
especially if this ties into actual validated, manual moderation afterwards, this sounds like a really great way to get a dataset to SFT a model
pangram themselves probably has something really similar set up for their own training
yep exactly, i probably wouldn't explicitly prompt the model that it's meant to pick it out for ai detection, rather that it should pick out the most important paragraph or something like that
the closer you can make the task sound to how you think google trains it is the key
looking quickly at the pangram api pricing, it seems like 10-50x more expensive per token than a model like gemini 3 flash
so it would probably make sense to use a cheap solid model to pull out a single paragraph for pangram to analyze, or maybe a few, would take some fiddling
ya fully agreed! if you keep the same interface for both options it would make it nice to compare and make sure a better solution is actually doing better
If this was easy, exa and other companies would provide this out of the box.
if you can get decent results with gemini or anthropic, then it may be worth digging into more
The quality of your results are heavily dependent on their web crawler, if they're able to filter out ads, banners, other random text, etc.
You can estimate prices pretty well if you assume a fixed max input length and use a model without reasoning + using structured output to get a yes/no resp
chiming in here randomly because this is interesting, it is probably best to prototype this with one of the big ai platforms which provide models + web fetch together for simplicity.
funny how this is now true for graphics as well as llms
He did a pretty good song with ozzy at one point
But this made me think, what would a good example of a reddit hive mind type thing be?
Moltbot: Except each agent gets 10k tokens of the epstein files it is responsible for and is tasked to work together to compile a list of names of the criminals
First thing it needs to do is post a summary
I do think it would be a cool experiment to RL train a model to make the posts that get the most upvotes from other models. You are then simulating a content creator lol
Is that the "purpose" of social media :think:
When I think about these questions, I think "how would I train this?"
You can generate examples of the model reading 10 posts and commenting on one, but there is no "purpose" there.
I can imagine doing RL for it, like if an agent posts on stack overflow if it can't figure something out
Is this voiced by Matt Mercer???
The AI-chat-enabled stuffed toy Bondu invites little kids to have intimate conversations with it, like an LLM imaginary friend. It also exposed virtually all their chats on a web interface with no security. Anyone with a Gmail account could log in and read transcripts. www.wired.com/story/an-ai-...
We see these things, but we've also been spending a year thinking about agents, and now we talk to them all day every day in things like claude code. Plus as engineers we have some vague understanding of how they work.
There's no difference between agent and wizard to 99.99% of people in the world
One thing that is interesting to try is to see if the model assumes the container is "codex like", as in it has things built in that come from codex. The most common thing is for the model to use apply_patch as a cli command to modify files, which comes from codex. Codex gives lots of neat hints!
thank you for trying! i wonder if you have to set it when enabling the tool on the api platform.openai.com/docs/guides/...
Really does feel like Anthropic's containers are better, especially with how long you can keep them alive
Have you tried having two api requests referencing the same container work at the same time?
Curious if you can force bugs or errors to happen when two processes modify the same file concurrently
the difference between making a product based on prompting vs RL training a model to complete objectives and doing it well
this is why RL is such a big deal!
thank you!
yes please! i'm interested :)