The search function doesn't even work at all anymore on BlueSky?
Posts by Aaron Meurer
Now can we have subproblem breakouts like this for literally every other eval?
Grok W
lol there are apparently some people so convinced that AI doesn't work that they believe it's popularity must be a mass psychosis event.
(I just want to add, for Bluesky, that this stuff has been around for almost 2 years now and it's honestly starting to get embarrassing if you haven't tried them enough to actually figure this out by now)
So while it's easy to "own" an AI by getting it to give a stupid answer to a simple question, you shouldn't let this fool you about its capabilities for things that you'd actually use them for.
But LLMs are a very different kind of intelligence. They can be very smart at one thing and very dumb at another. (LLMs actually do also have their own intelligence correlations, but these are not really obvious even to people who use them a lot).
That's why we ask stupid questions in interviews like whiteboard coding puzzles or "what's your biggest weakness?". Those things don't actually directly matter for the job, but they correlate enough that we can infer things from them.
The reason for this is a bit unintuitive. We're used to being able to use proxy questions to judge intelligence because for humans, certain tasks correlate with each other and we have a good intuition for this.
If you want to know how good an AI is, ask it a question about something you actually care about, not proxy questions that you would never actually want to know the answer to.
Ask it how it would fix a problem you had yesterday, not how many r's are in "strawberry".
Why do you need to know how many b's there are in "blueberry"?
LLMs do not use tesseract. They read the text off the image directly.
Context poisoning from looping to fix errors is something that needs to be fixed in the agent runners. When using chat apps I fix this by starting new chats or editing previous queries to keep the context down. Agents need to do something similar where they delete the "fixup" loop from the context.
So what ended up being the resolution of that whole Bluesky Gaza spam thing?
I'm not sure that it actually helps that Bluesky hides the posts you've blocked. Makes posts like these seem like you're being hyperbolic unless you actually go and look up blocked posts on skythread.
Regarding your last paragraph, what do you think of Meta's AI app? x.com/venturetwins...
SymPy GitHub page not logged in.
SymPy GitHub page logged in.
Why is the GitHub mobile interface different depending on whether you're logged in or not?
The tabs move to the top...the star button gets smaller...more links like "code of conduct" appear... WTF?
Still can't disable reposts from specific accounts though, so I'll still not be following a lot of you unfortunately.
They finally added the ability to get push notifications only for mentions on this app. So while I'll still not use it much I'll at least now see it if you mention me.
The one at home is the clone. The one still at teleporter is the original.
Whatever happened to the $100B in profits definition of AGI?
This post doesn't even answer the question though
Most people who fly on planes survive. Let's hear from one of the people who died.
It can understand English and follow arbitrary instructions.
It didn't work for me
As long as the alternatives actually are better, we should expect them to cost more, no?
The reason is that GitHub has some random ways it treats "fork" repos differently from normal repos, which are going to be annoying if you plan to do your primary development there.
PSA: if you ever want to "fork" a project in the OSS sense, don't use the GitHub "fork" feature. Create a new repository and copy the history over instead.
defaults write com.google.Chrome GeminiSettings -int 1
I think what's really a bad sign for Bluesky is not just that there isn't a usable algorithm, but that the tech exists to allow anyone to build their own algorithm, and yet basically no one has done that.