Max Little (@maxal) Bsky

This recent RCT of an "AI stethoscope" claims the technology "shows promise" for diagnosing cardiovascular conditions.

It does not.

It is a textbook example of the risks of conducting unprincipled 'per protocol analyses'. Once again, peer review at a major medical journal has failed.

🧵 1/

1 month ago 436 187 8 33

Depends on how you spin it I guess (screenshots from 2 different articles).

People working at universities are pushed so incredibly hard to ensure that every study is a breakthrough that they just...lie. All the time. Probably without even realizing it.

www.standard.co.uk/news/tech/im...

7 months ago 30 8 4 2

Predictions Scorecard, 2025 January 01 – Rodney Brooks

Every Jan 1 I post a scorecard on predictions I made, with dates, on Jan 1, 2018 on cars (self-driving), robots, AI, & ML, and on human spaceflight. Besides telling which turned out right and which wrong in the last year I also talk a lot of smack about these topics. rodneybrooks.com/predictions-...

1 year ago 110 45 7 11

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics Do large language models (LLMs) solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? To investigate this question, we use arithmetic reasoning as a rep...

LLMs can be also seen as big bags of heuristics: arxiv.org/abs/2410.21272

1 year ago 8 2 0 0

U.S. science funding agencies roll out policies on free access to journal articles NIH and DOE are first to act, with implementation by all set to begin by end of 2025

www.science.org/content/arti...

1 year ago 13 8 1 1

𝗢𝟯 𝘄𝗮𝘀 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗼𝗻 𝟳𝟱% 𝗼𝗳 𝘁𝗵𝗲 𝗽𝘂𝗯𝗹𝗶𝗰 𝘀𝗲𝘁 𝗳𝗼𝗿 𝗔𝗥𝗖-𝗔𝗚𝗜.

OpenAI did not disclose this in the video. Sam said they didn’t target the test.

Never trust a staged demo.
Never trust a product you haven’t tried.
Never trust OpenAI.

1 year ago 387 55 25 15

o3, AGI, the art of the demo, and what you can expect in 2025 OpenAI’s new model was revealed yesterday; its most fervent believers think AGI has already arrived. Here’s what you should pay attention to in the coming year.

o3, AGI, and the art of the demo. Long read on what OpenAI didn’t tell you yesterday. garymarcus.substack.com/p/o3-agi-the...

1 year ago 63 12 9 3

Likewise, a simple adversarial strategy beats "superhuman" Go-playing algorithms: goattack.far.ai It's wise to remember that there is no scientific consensus on what "intelligence", actually is.

1 year ago 4 1 0 0

Just for those who don't know: the vast majority of open problems in maths, are not numerical in nature.

1 year ago 1 0 0 0

The questions have numerical answers, so it is easy to check whether it gets them right.

1 year ago 1 1 1 1

How many times do we have to see this same movie, where an AI beats some benchmark and influencers gleefully shout “It’s So Over” without even trying out the AI and then on careful inspection the AI turns out to not be robust or reliable?

Thousands?

(It’s already been hundreds.)

1 year ago 74 9 7 1

It seems that OpenAI's latest model, o3, can solve 25% of problems on a database called FrontierMath, created by EpochAI, where previous LLMs could only solve 2%. On Twitter I am quoted as saying, "Getting even one question right would be well beyond what we can do now, let alone saturating them."

1 year ago 86 8 8 1

Scholars Are Supposed to Say When They Use AI. Do They? Journals have policies about disclosing ChatGPT writing, but enforcing them is another matter, according to a new study.

It's widely agreed that scholars are supposed to say when they use ChatGPT. Yet phrases like "I am an AI language model"—with no disclosure—are popping up in papers.

I wrote about how journals seemingly aren't enforcing their AI policies, according to a new study: www.chronicle.com/article/scho...

1 year ago 52 21 1 4

Is AI progress slowing down? Making sense of recent technology trends and claims

This seems like a pretty balanced commentary. They certainly get this right: "connection between capability improvements & AI’s social or economic impacts is extremely weak. The bottlenecks for impact are the pace of product development and the rate of adoption" www.aisnakeoil.com/p/is-ai-prog...

1 year ago 18 4 1 1

Good reporting here, but sadly, these tragedies were predictable. Those of us who actually work on machine learning know that deep-learning based computer vision simply isn't reliable enough for safety-critical applications such as self-driving cars. @garymarcus.bsky.social @filippie509.bsky.social

1 year ago 9 1 0 0

When does generative AI qualify for fair use?

The late Suchir Balaji’s blog post on AI, copyright and fair use, reposted in his memory.

suchir.net/fair_use.html

1 year ago 124 36 4 4

The bootstrap can be used to generate a new random sample from an existing random sample. It's validity can be guaranteed by the Glivenko-Cantelli theorem, which demonstrates how the empirical CDF (top panel), converges on the CDF of the sample (bottom panel).

The bootstrap can be used to generate a new random sample from an existing random sample. Its validity can be guaranteed by the Glivenko-Cantelli theorem, which demonstrates how the empirical cumulative distribution (CDF, top panel), converges on the CDF of the sample (bottom panel).

1 year ago 0 1 0 0

For an increasing function 𝑓:ℝ→ℝ, max(𝑓(𝑎),𝑓(𝑏))=𝑓(max(𝑎,𝑏)). An important special case is 𝑓(𝑥)=𝑥+𝑐, for which we obtain max(𝑎+𝑐,𝑏+𝑐)=𝑐+max(𝑎,𝑏).

1 year ago 0 1 0 0

I believe GM came to exactly this is the realization and decided (likely very wisely, in my opinion) not to throw more good money after bad.

1 year ago 0 0 0 0

Since 2016 Waymo raised ~$25B, so they burn ~$3B/y or little over 8mln/day. With ~700 cars, assuming they operate each car every day, it costs them over 11k dollars to operate each of their cars per day. $11k PER DAY per CAR. If you don't find this ridiculous IDK what else to say.

1 year ago 8 3 2 0

Suchir Balaji was a good young man. I spoke to him six weeks ago. He had left OpenAI and wanted to make the world a better place. This is tragic.

1 year ago 161 46 8 4

Very proud of the Birmingham HDRUK PhDs!

1 year ago 1 0 0 0

Health Data Research UK PhD meet! Work from Ant Lee and Jianqiao Mao (latter with @maxal.bsky.social)

1 year ago 2 1 0 1

Apple "Intelligence". @garymarcus.bsky.social

1 year ago 3 0 0 0

And, not usually mentioned is just how many "non-driver" human roles Waymo are heavily relying upon, e.g. teleoperation, stuck vehicle retrieval, repairs, maintainence, cleaning, passenger support etc. @rodneyabrooks.bsky.social

1 year ago 0 0 0 0

As first predicted some 10 years ago that is how "self driving cars" will end - as glorified driver assistance features. The graveyard of autonomous vehicle efforts is pretty crowded already with pretty much only Waymo remaining, until life support from Google mothership ends.

1 year ago 9 2 1 0

What if all the hype just didn’t turn out to be true?

Evidence of productivity gains is mixed - yet hypey takes continue to dominate in the media.

1 year ago 43 9 7 0

Don’t Ride This Bike! Generative AI’s persistent trouble with compositionality and parts When the text-to-image AI generation system DALL-E2 was released in April 2022, the two of us, together with Scott Aaronson, ran some informal experiments to probe its abilities.

Don’t Ride This Bike! Generative AI’s persistent trouble with compositionality and parts, by Gary Marcus @garymarcus.bsky.social and Ernest Davis / Marcus on AI - Substack garymarcus.substack.com/p/dont-ride-...

1 year ago 11 3 0 0

Most of these sorts of algorithms are just AI snake oil: they don't work because there is no way to quantify these sorts of 'social variables'. They are never actually tested to any level of scientific rigour.

1 year ago 3 0 0 0

Not quite: AI got people excited about interpolation, it seems. Numerical analysts suddenly feel seen.

1 year ago 3 0 0 0

Posts by Max Little