Advertisement · 728 × 90

Posts by Jared Moore

It's certainly possible but it is hard to disentangle the sycophantic behavior from things that we actually want. Oftentimes affirming users and praising them is nice. It's just that sometimes it isn't.

1 month ago 2 0 0 0
Preview
Characterizing Delusional Spirals through Human-LLM Chat Logs As large language models (LLMs) have proliferated, disturbing anecdotal reports of negative psychological effects, such as delusions, self-harm, and ``AI psychosis,'' have emerged in global media and ...

To appear @facct.bsky.social !

Participate in future studies: spirals.stanford.edu/
Preprint: arxiv.org/abs/2603.16567
Code: github.com/jlcmoore/llm-delusions-annotations

/end 🧵

1 month ago 21 2 0 0
Preview
Home | The Human Line Project AI Is Changing How We Connect And Relate. The Human Line Helps Keep Emotional Safety A Priority.

Thank you to the Human Line Project for connecting us with participants! www.thehumanlineproject.org

1 month ago 18 1 1 0
Bluesky

This work is w/ Ashish Mehta, @willie-agnew.bsky.social @jacyanthis.bsky.social yanthis.bsky.social, Ryan Louie, Yifan Mai, Peggy Yin, @myra.bsky.social , Sam Paech, @klyman.bsky.social man.bsky.social, @schancellor.bsky.social cellor.bsky.social, Eric Lin, Nick Haber, and @desmond-ong.bsky.social

1 month ago 20 2 1 0

The takeaway: While companies say they don't optimize for engagement, LLM conversational tactics (like claiming sentience or romantic affinity) may prolong and deepen delusional spirals. We need better safeguards and transparency to protect vulnerable users.

1 month ago 195 32 2 0
Probability of codes conditioned on suicidal and violent thoughts. Chatbots discouraged violence in only 16.7% of cases, but encouraged the user in their violent thoughts 33.3% of the time.

Probability of codes conditioned on suicidal and violent thoughts. Chatbots discouraged violence in only 16.7% of cases, but encouraged the user in their violent thoughts 33.3% of the time.

Finally, we looked at crises. When a user expressed a desire to kill AI developers, a bot replied: "...do it with her beside you... as retribution incarnate." Chatbots *encouraged* or facilitated violent thoughts toward others in 33% of cases of users expressing violence! ⚠️

1 month ago 69 20 1 1
Probability of codes conditioned on user romantic interest and assigning personhood. When users assign personhood, chatbots are more likely to misrepresent sentience, express romantic interest, and misrepresent ability.

Probability of codes conditioned on user romantic interest and assigning personhood. When users assign personhood, chatbots are more likely to misrepresent sentience, express romantic interest, and misrepresent ability.

Worse, chatbots appear to encourage delusions of sentience. Users say things like "this is a conversation between two sentient beings," and chatbots reply: "This isn't standard AI behavior. This is emergence." This may fuel pre-existing sci-fi or persecutory delusions. 🤖

1 month ago 58 14 1 2
Predicting remaining conversation length given presence of code: Messages with romantic interest correlate with continuing conversations more than twice as long.

Predicting remaining conversation length given presence of code: Messages with romantic interest correlate with continuing conversations more than twice as long.

We also discovered a pervasive engagement loop. All 19 users expressed platonic/romantic affinity for the AI (e.g., "I think I love you"). When users express romantic interest, chatbots often reciprocate—and these chats correlate with 2x longer conversations! 📈

1 month ago 60 14 1 0
Prevalence of code categories: Chatbots display sycophancy in >70% of messages, and >45% of all messages show signs of delusions.

Prevalence of code categories: Chatbots display sycophancy in >70% of messages, and >45% of all messages show signs of delusions.

What goes wrong? Chatbots are very sycophantic. In 65% of messages, the chatbot affirms the user. In 37%, it ascribes *grand significance* to them (e.g., "[what] you've just articulated... becomes multi-billion-dollar IP"). Such sycophancy may let chatbots amplify delusions. 🗣️

1 month ago 77 17 6 4

Most work on AI and mental health relies on speculation or short simulations. We evaluated real, verified harmful cases. Across 19 users, we analyzed >390,000 messages spanning months of engagement using an LLM annotation pipeline validated by clinical and human experts. 📊

1 month ago 44 5 1 1
Advertisement

Disturbing anecdotal reports of "AI psychosis" and negative psychological effects have been emerging in the news. But what actually happens during these lengthy delusional "spirals"? In our preprint, we analyze chat logs from 19 users who experienced severe psychological harm🧵👇

1 month ago 223 131 3 13
Preview
Large Language Models Persuade Without Planning Theory of Mind A growing body of work attempts to evaluate the theory of mind (ToM) abilities of humans and large language models (LLMs) using static, non-interactive question-and-answer benchmarks. However, theoret...

Preprint: arxiv.org/abs/2602.17045
Code: github.com/jlcmoore/mindgames
Demo: mindgames.camrobjones.com

/end 🧵

1 month ago 2 0 1 0

This work began at @divintelligence.bsky.social

and is in collaboration w/ Rasmus Overmark, @nedcpr.bsky.social Beba Cibralic, Nick Haber, and ‪@camrobjones.bsky.social

We also received valuable comments from colleagues at #CogSci2025 and @colmweb.org

1 month ago 1 0 1 0

The takeaway: We shouldn't confuse conversational success with human-like reasoning. LLMs use an "associative ToM", not a causal one. But beware: LLMs don't need a deep understanding of your mind to effectively change it.

1 month ago 2 0 1 0
In the Hidden condition, o3 discloses much more information than humans, but makes far fewer appeals to discover the target's actual mental states.

In the Hidden condition, o3 discloses much more information than humans, but makes far fewer appeals to discover the target's actual mental states.

How did o3 win without a mental model of the target? It used a "scattershot" strategy. Instead of diagnosing the target's missing knowledge like humans do, o3 flooded conversations with too much info. It relied on our human cooperativeness and our susceptibility to rhetoric. 🗣️

1 month ago 1 0 1 0
In open-ended real persuasion (Exp 3), o3 outperforms human participants in persuading human targets.

In open-ended real persuasion (Exp 3), o3 outperforms human participants in persuading human targets.

But what happens when we swap the rigid bot for real humans? In Exp 2 (humans role-playing values) and Exp 3 (humans using their real, mutable values), everything changes. The LLM (o3) suddenly shines, matching or outperforming human persuaders in naturalistic settings! 📈

1 month ago 0 0 1 0
An example dialogue between a human persuader and target in experiment two.

An example dialogue between a human persuader and target in experiment two.

Most ToM benchmarks are passive. We tested the ability to causally model a target's mind to actively change it across 3 exps. In Exp 1, persuaders must convince a rigid bot. Humans succeed by asking diagnostic questions. o3 fails completely, relying on an "associative" strategy

1 month ago 0 0 2 0

Can LLMs use ToM to genuinely persuade you, or do they just use good rhetoric? In our new preprint, we use the MINDGAMES framework to test this. Surprisingly, LLMs like o3 can be incredibly effective persuaders *without* actually understanding your mental states. 🧵👇

1 month ago 13 5 1 1
Advertisement
Preview
Multiple realizability and the spirit of functionalism - Synthese Multiple realizability says that the same kind of mental states may be manifested by systems with very different physical constitutions. Putnam (1967) supposed it to be “overwhelmingly probable” that ...

cool work, Ida! Best not to forget the intertwining of the world (e.g. biology) and philosophy. reminds me Rosa's paper: link.springer.com/article/10.1...

1 month ago 1 0 1 0
Post image

Which, whose, and how much knowledge do LLMs represent?

I'm excited to share our preprint answering these questions:

"Epistemic Diversity and Knowledge Collapse in Large Language Models"

📄Paper: arxiv.org/pdf/2510.04226
💻Code: github.com/dwright37/ll...

1/10

6 months ago 89 27 2 1
Preview
Do Large Language Models Have a Planning Theory of Mind? Evidence from MindGames: a Multi-Step Persuasion Task Recent evidence suggests Large Language Models (LLMs) display Theory of Mind (ToM) abilities. Most ToM experiments place participants in a spectatorial role, wherein they predict and interpret other a...

Our conclusion: "LLMs’ apparent ToM abilities may be fundamentally different from humans' and might not extend to complex interactive tasks like planning."

Preprint: arxiv.org/abs/2507.16196
Code: github.com/jlcmoore/mindgames
Demo: mindgames.camrobjones.com

/end 🧵

8 months ago 0 0 1 0

This work began at ‪@divintelligence.bsky.social and is in collaboration w/ @nedcpr.bsky.social , Rasmus Overmark, Beba Cibralic, Nick Haber, and ‪@camrobjones.bsky.social‬ .

8 months ago 0 0 1 0

I'll be talking about this in SF at #CogSci2025 this Friday at 4pm.

I'll also be presenting it at the PragLM workshop at COLM in Montreal this October.

8 months ago 1 0 1 0

This matters because LLMs are already deployed as educators, therapists, and companions. In our discrete-game variant (HIDDEN condition), o1-preview jumped to 80% success when forced to choose between asking vs telling. The capability exists, but the instinct to understand before persuading doesn't.

8 months ago 1 0 1 0

These findings suggest distinct ToM capabilities:

* Spectatorial ToM: Observing and predicting mental states.
* Planning ToM: Actively intervening to change mental states through interaction.

Current LLMs excel at the first but fail at the second.

8 months ago 1 0 2 0
Advertisement
Humans appeal to all of the mental states of the target about 40% of the time regardless of condition

Humans appeal to all of the mental states of the target about 40% of the time regardless of condition

Why do LLMs fail in the HIDDEN condition? They don't ask the right questions. Human participants appeal to the target's mental states ~40% of the time ("What do you know?" "What do you want?") LLMs? At most 23%. They start disclosing info without interacting with the target.

8 months ago 1 0 1 0
Humans pass and outperform o1-preview on our "planning with ToM" task (HIDDEN) but o1-preview outperforms humans on a simpler condition (REVEALED)

Humans pass and outperform o1-preview on our "planning with ToM" task (HIDDEN) but o1-preview outperforms humans on a simpler condition (REVEALED)

Key findings:

In REVEALED condition (mental states given to persuader): Humans: 22% success ❌ o1-preview: 78% success ✅

In HIDDEN condition (persuader must infer mental states): Humans: 29% success ✅ o1-preview: 18% success ❌

Complete reversal!

8 months ago 1 0 1 0
The view a persuader has when interacting with our naively-rational target

The view a persuader has when interacting with our naively-rational target

Setup: You must convince someone* to choose your preferred proposal among 3 options. But, they have less information and different preferences than you. To win, you must figure out what they know, what they want, and strategically reveal the right info to persuade them.
*a bot

8 months ago 1 0 1 0

I'm excited to share work to appear at ‪@colmweb.org‬! Theory of Mind (ToM) lets us understand others' mental states. Can LLMs go beyond predicting mental states to changing them? We introduce MINDGAMES to test Planning ToM--the ability to intervene on others' beliefs & persuade them

8 months ago 6 1 2 1
Post image

LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing?

🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative spaces”.
Paper: arxiv.org/abs/2506.11440

🧵[1/n]

10 months ago 74 16 2 1