Aaron Meurer (@asmeurer.com) Bsky

The search function doesn't even work at all anymore on BlueSky?

6 months ago 1 0 0 0

Now can we have subproblem breakouts like this for literally every other eval?

7 months ago 0 0 0 0

Grok W

7 months ago 1 0 0 0

lol there are apparently some people so convinced that AI doesn't work that they believe it's popularity must be a mass psychosis event.

8 months ago 1 0 1 0

(I just want to add, for Bluesky, that this stuff has been around for almost 2 years now and it's honestly starting to get embarrassing if you haven't tried them enough to actually figure this out by now)

8 months ago 0 0 0 0

So while it's easy to "own" an AI by getting it to give a stupid answer to a simple question, you shouldn't let this fool you about its capabilities for things that you'd actually use them for.

8 months ago 0 0 1 0

But LLMs are a very different kind of intelligence. They can be very smart at one thing and very dumb at another. (LLMs actually do also have their own intelligence correlations, but these are not really obvious even to people who use them a lot).

8 months ago 0 0 1 0

That's why we ask stupid questions in interviews like whiteboard coding puzzles or "what's your biggest weakness?". Those things don't actually directly matter for the job, but they correlate enough that we can infer things from them.

8 months ago 0 0 1 0

The reason for this is a bit unintuitive. We're used to being able to use proxy questions to judge intelligence because for humans, certain tasks correlate with each other and we have a good intuition for this.

8 months ago 0 0 1 0

If you want to know how good an AI is, ask it a question about something you actually care about, not proxy questions that you would never actually want to know the answer to.

Ask it how it would fix a problem you had yesterday, not how many r's are in "strawberry".

8 months ago 0 0 1 0

Why do you need to know how many b's there are in "blueberry"?

8 months ago 0 0 0 0

LLMs do not use tesseract. They read the text off the image directly.

8 months ago 0 0 0 0

Context poisoning from looping to fix errors is something that needs to be fixed in the agent runners. When using chat apps I fix this by starting new chats or editing previous queries to keep the context down. Agents need to do something similar where they delete the "fixup" loop from the context.

8 months ago 1 0 1 0

So what ended up being the resolution of that whole Bluesky Gaza spam thing?

8 months ago 1 0 0 0

I'm not sure that it actually helps that Bluesky hides the posts you've blocked. Makes posts like these seem like you're being hyperbolic unless you actually go and look up blocked posts on skythread.

8 months ago 0 0 0 0

Regarding your last paragraph, what do you think of Meta's AI app? x.com/venturetwins...

8 months ago 0 0 0 0

SymPy GitHub page not logged in.

SymPy GitHub page logged in.

Why is the GitHub mobile interface different depending on whether you're logged in or not?

The tabs move to the top...the star button gets smaller...more links like "code of conduct" appear... WTF?

9 months ago 1 0 0 0

Still can't disable reposts from specific accounts though, so I'll still not be following a lot of you unfortunately.

9 months ago 0 0 0 0

They finally added the ability to get push notifications only for mentions on this app. So while I'll still not use it much I'll at least now see it if you mention me.

9 months ago 1 0 1 0

The one at home is the clone. The one still at teleporter is the original.

9 months ago 1 0 0 0

Whatever happened to the $100B in profits definition of AGI?

9 months ago 1 0 0 0

This post doesn't even answer the question though

9 months ago 0 0 0 0

Most people who fly on planes survive. Let's hear from one of the people who died.

10 months ago 18 1 1 0

It can understand English and follow arbitrary instructions.

10 months ago 3 0 0 0

It didn't work for me

10 months ago 1 0 0 0

As long as the alternatives actually are better, we should expect them to cost more, no?

10 months ago 0 0 0 0

The reason is that GitHub has some random ways it treats "fork" repos differently from normal repos, which are going to be annoying if you plan to do your primary development there.

10 months ago 0 0 0 0

PSA: if you ever want to "fork" a project in the OSS sense, don't use the GitHub "fork" feature. Create a new repository and copy the history over instead.

10 months ago 0 0 1 0

defaults write com.google.Chrome GeminiSettings -int 1

10 months ago 0 1 0 0

I think what's really a bad sign for Bluesky is not just that there isn't a usable algorithm, but that the tech exists to allow anyone to build their own algorithm, and yet basically no one has done that.

10 months ago 1 0 0 0

Posts by Aaron Meurer