John Q Public (@conjurial) Bsky

The Roman Senate making a comeback, I like it

19 hours ago 2 0 0 0

Reading today's open-closed performance gap The complex factors that determine the single evaluation number so many focus on. Plus, how this changes in the future.

A TLDR is that unless the training dynamics of leading LLMs change or open model builders run out of money, this ~6 month performance gap from closed to open models is here to stay.
www.interconnects.ai/p/reading-to...

1 day ago 22 3 1 1

I've heard it said, though I don't really believe it, that as many as three things might be true.

1 day ago 43 7 2 1

The first panel shows a crow with the title "How to live a good life". The second panel shows a crow cawing at itself in the mirror with the subheading "Make friends". The next panel says "Explore" and shows a crow looking into a commercial waste bin. The next says "Try new things" with a crow eating something vile. The next one says "Be curious" and shows the crow grabbing a hissing cat's tail". The final frame says "Get a hobby" and shows the crow looking closely at a book of matches.

How To Live A Good Life #oldknees

1 day ago 9484 2731 22 32

“We’re gonna give only good science-based health advice but in a way that sounds good if you’re ‘doing your own research’” would be a really interesting sociological experiment

1 day ago 8 4 0 0

Alex Jones-inflected conspiracizing about things that are actually true would be funny

THEY don’t WANT you to know about the huge health benefits of this magic little supplement called lipitor

1 day ago 8 2 1 0

I *hate* time zones and I especially hate how the same piece of land is in different time zones at different times

1 day ago 4 0 0 0

*looking

1 day ago 0 0 0 0

so a) I don’t think reasoning has to be definitionally perfect: you can reason incorrectly, realize it, try again

b) the outputs resemble human reasoning and scientifically I’m really looming forward to more research illuminating how the LLM process and the human process are and aren’t similar

1 day ago 0 0 1 0

but it works well at some of the same tasks humans do indisputably need to engage in reasoning to solve. Also clearly works differently, fails in cases a human wouldn’t, different strengths/weaknesses

You can get across water by swimming or kayaking but you did eventually get to the other side

1 day ago 6 0 1 0

They can be made to draw statistical inferences in a compositional, abstractive, hierarchical way in the output token stream, and doing so greatly improves performance.

The field calls this “reasoning” and whether it really is is an interesting question for the cognitive scientists and philosophers

1 day ago 5 0 1 0

As the old joke goes, “can a submarine swim?”

And as the answer goes, “faster than you”

1 day ago 2 0 0 0

Testing theory of mind in large language models and humans - Nature Human Behaviour Testing two families of large language models (LLMs) (GPT and LLaMA2) on a battery of measurements spanning different theory of mind abilities, Strachan et al. find that the performance of LLMs can mi...

inferring intent and responding appropriately is a thing the existing literature evaluates LLMs on! results are uneven but not consistent with the idea they can't do it at all

1 day ago 0 0 0 0

A Survey of Uncertainty Estimation Methods on Large Language Models Zhiqiu Xia, Jinxuan Xu, Yuqian Zhang, Hang Liu. Findings of the Association for Computational Linguistics: ACL 2025. 2025.

not reliably!

people are working on it though

1 day ago 1 0 0 0

1 day ago 2 0 0 0

I’ll need at least 10 million to properly practice

1 day ago 6 0 0 0

this does require that it be able to tell whether it's accomplished the goal. which sounds stupid when I say it like that but is really imo the fundamental constraint on where and how agents are useful

1 day ago 5 0 0 0

a very...wait for it...rational attitude

1 day ago 6 0 1 0

I described this as "executable citations" once and was told by a friend -- another PhD student no less -- that I should go outside

and damn if that wasn't the truth

1 day ago 36 1 1 1

there was this fabulous New Yorker cartoon I can't find at the moment about a kid telling his dad "I don't *want* to be a brilliantly creative tortured artist, I want to mine coal"

1 day ago 7 0 1 0

God made the integers, all else is the work of electrical engineers

1 day ago 24 0 1 0

this is also my issue with the no free lunch theorems: things that are simple enough to physically exist are a null set

1 day ago 7 0 1 0

the entire set of IEEE 754 representable numbers has Lebesgue measure 0

1 day ago 14 0 3 0

feel like I've seen this in a sci fi movie somewhere eh

1 day ago 1 0 0 0

two images of the human body's circulatory system. One of them with good cable management

The human circulatory system, before and after proper cable management.

2 days ago 19710 3826 290 269

like it's true that given it can run at all, a robot being able to run faster than a human is not surprising, cars are faster than bikes, etc etc

but I don't think people understand how hard it is to be able to run at all

1 day ago 13 0 2 0

the most principled argument imo is the straightforwardly religious one: God gave us souls but not the AIs, therefore...

many people now don't believe this argument but also don't like the consequences of discarding it

2 days ago 5 0 1 0

Theorem: All numbers are interesting.

Proof: Assume toward a contradiction that some numbers are uninteresting. Then there is a smallest such number, which makes that number interesting. ⊥

2 days ago 1 0 0 0

imo most of the reason people are scared of nuclear reactors is that there, you can't say this argument is fake, you have to explain how you've mitigated it, and that's inherently less convincing

2 days ago 9 0 0 0

the education mechanism will end up being "everyone had to go through an enterprise rollout at work" or knows somebody who did and can explain

2 days ago 1 0 0 0

Posts by John Q Public