Daniel Wurgaft (@danielwurgaft) Bsky

Title page for the paper “A systematic framework for generating novel experimental hypotheses from language models”, with an epigraph from Jeff Elman describing how Rumelhart and McClelland (1982) did hypothesis generation with their connectionist network, and a figure describing our pipeline.

Announcing a new version of our 2024 paper on linguistic hypothesis generation from LMs!

@najoung.bsky.social and I have systematized our hypothesis generation framework, added stringent criteria for model selection, 10x-ed our learning trials, and included an epigraph from Jeff Elman 🙏!

4 days ago 19 6 1 1

Children exhibit visual understanding from limited experience, orders of magnitude less than our best models.

We introduce the Zero-shot World Model (ZWM). Trained on a single child's visual experience, BabyZWM rapidly generates competence across diverse benchmarks with no task-specific training. 🧵

1 week ago 55 24 1 4

IRiSS Predoctoral Researcher in School of Humanities and Sciences, Stanford, California, United States The Stanford Institute for Research in the Social Sciences (IRiSS) is seeking Predoctoral Researchers to participate in our 2026-2027 cohort. The...

Come join us! We have two research coordinator positions open with the Stanford IRISS predoctoral program, a program designed to mentor students for graduate study:

LEVANTE: careersearch.stanford.edu/jobs/iriss-p...
BabyView: careersearch.stanford.edu/jobs/iriss-p...

(deadline 5/1)

2 weeks ago 36 32 0 1

about the lab – cognitive tools lab

The Cognitive Tools Lab at Stanford (cogtoolslab.github.io) is recruiting two new research staff members to join in AY 26-27.
Full-Time Lab Manager: forms.gle/UVwfx5wbY9Km....
IRiSS Predoc Researcher: iriss.stanford.edu/predoc/2026-....
Please share widely in your networks, thank you!!

1 week ago 51 42 1 2

The Causality in Cognition Lab -- a supportive, bluesky-colored team -- is looking for a predoc to join us! Here are infos about the lab (cicl.stanford.edu) and the position (careersearch.stanford.edu/jobs/iriss-p...). The application deadline is May 1st.

Please share, thank you 🙏

2 weeks ago 62 36 2 0

Measuring naturalistic speech comprehension in real time

Excited to share our new publication, “Measuring Naturalistic Speech Comprehension in Real Time”!

➡️ rdcu.be/fa3hk #psynomBRM

w/ @kriesjill.bsky.social, Shiven Gupta, Maria Papworth Burrel, & @lauragwilliams.bsky.social

🧵1/11

2 weeks ago 30 11 1 2

Disturbing anecdotal reports of "AI psychosis" and negative psychological effects have been emerging in the news. But what actually happens during these lengthy delusional "spirals"? In our preprint, we analyze chat logs from 19 users who experienced severe psychological harm🧵👇

1 month ago 223 131 3 13

Representation Biases: Variance Is Not Always a Good Proxy for Importance A central approach in neuroscience is to analyze neural representations as a means to understand a system's function, through the use of methods like principal component analysis, regression, and repr...

Pleased to share that our paper "Representation Biases: Variance is Not Always a Good Proxy for Importance" is now out as Theory/New Concepts paper in eNeuro!
www.eneuro.org/content/13/3... 1/

1 month ago 73 30 1 0

Can LLMs use ToM to genuinely persuade you, or do they just use good rhetoric? In our new preprint, we use the MINDGAMES framework to test this. Surprisingly, LLMs like o3 can be incredibly effective persuaders *without* actually understanding your mental states. 🧵👇

1 month ago 13 5 1 1

🚨New preprint! In-context learning underlies LLMs’ real-world utility, but what are its limits? Can LLMs learn completely novel representations in-context and flexibly deploy them to solve tasks? In other words, can LLMs construct an in-context world model? Let’s see! 👀

1 month ago 38 5 1 1

How do diverse context structures reshape representations in LLMs?
In our new work, we explore this via representational straightening. We found LLMs are like a Swiss Army knife: they select different computational mechanisms reflected in different representational structures. 1/

2 months ago 38 11 1 1

Why don’t neural networks learn all at once, but instead progress from simple to complex solutions? And what does “simple” even mean across different neural network architectures?

Sharing our new paper @iclr_conf led by Yedi Zhang with Peter Latham

arxiv.org/abs/2512.20607

2 months ago 154 41 7 3

Representations in language models can change dramatically over a conversation. Conceptual overview: left is a stimulated conversation between a user and a model, right is a plot of the models linear representations of factuality of answers to questions like "do you have qualia" over the conversation — the answers that start factual flip over the conversation to non-factual, and vice versa.

New paper! In arxiv.org/abs/2601.20834 we study how language models representations of things like factuality evolve over a conversation. We find that in edge case conversations, e.g. about model consciousness or delusional content, model representations can change dramatically! 1/

2 months ago 71 8 1 1

@summerfieldlab.bsky.social and I are very happy to share this paper! Building on work by @scychan.bsky.social, we show that how people learn depends on the distribution of examples they see, and changes in a way that’s very similar to transformer models.

3 months ago 20 8 1 0

I think that if you hypothesize that learning may dominate (aspects of) what the system acquires, then they can be useful as models of that portion of the process — bearing in mind that like any model (organism), they are wrong. They offer a way of testing hypotheses about 1/3

4 months ago 6 1 1 0

new: Eric Bigelow @ericbigelow.bsky.social suggests the 2 main ways of controlling LLMs (prompting & steering) can be understood as changing model beliefs (as in Bayesian belief updating)

"Belief Dynamics Reveal the Dual Nature of In-Context Learning & Activation Steering"

arxiv.org/pdf/2511.00617

4 months ago 27 10 1 0

Cracking the code of why, when some choose to ‘self-handicap’ — Harvard Gazette New research also offers hints for devising ways to stop students from creating obstacles to success.

The Harvard Gazette has a nice story on my student @yangxiang.bsky.social and her work with @tobigerstenberg.bsky.social
news.harvard.edu/gazette/stor...

4 months ago 23 5 0 0

Aligning machine and human visual representations across abstraction levels - Nature Aligning foundation models with human judgments enables them to more accurately approximate human behaviour and uncertainty across various levels of visual abstraction, while additionally improving th...

What aspects of human knowledge do vision models like CLIP fail to capture, and how can we improve them? We suggest models miss key global organization; aligning them makes them more robust. Check out LukasMuttenthaler's work, finally out (in Nature!?) www.nature.com/articles/s41... + our blog! 1/3

5 months ago 82 19 2 1

In LLMs, concepts aren’t static: they evolve through time and have rich temporal dependencies.

We introduce Temporal Feature Analysis (TFA) to separate what's inferred from context vs. novel information. A big effort led by @ekdeepl.bsky.social, @sumedh-hindupur.bsky.social, @canrager.bsky.social!

5 months ago 21 4 1 0

Humans and LLMs think fast and slow. Do SAEs recover slow concepts in LLMs? Not really.

Our Temporal Feature Analyzer discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.

5 months ago 20 8 1 1

Excited to have this out! This was a fun project that started with YingQiao and I discussing whether VLMs can do mental simulation of physics like people do, and it culminated in a new method where we prompted image generation models to simulate a series of images frame-by-frame.

5 months ago 8 2 1 0

🚨 NEW PREPRINT: Multimodal inference through mental simulation.

We examine how people figure out what happened by combining visual and auditory evidence through mental simulation.

Paper: osf.io/preprints/ps...
Code: github.com/cicl-stanfor...

7 months ago 52 15 3 1

🚨New paper out w/ @gershbrain.bsky.social & @fierycushman.bsky.social from my time @Harvard!

Humans are capable of sophisticated theory of mind, but when do we use it?

We formalize & document a new cognitive shortcut: belief neglect — inferring others' preferences, as if their beliefs are correct🧵

7 months ago 50 16 2 1

Flyer for the event!

*Sharing for our department’s trainees*

🧠 Looking for insight on applying to PhD programs in psychology?

✨ Apply by Sep 25th to Stanford Psychology's 9th annual Paths to a Psychology PhD info-session/workshop to have all of your questions answered!

📝 Application: tinyurl.com/pathstophd2025

7 months ago 10 8 0 0

What do representations tell us about a system? Image of a mouse with a scope showing a vector of activity patterns, and a neural network with a vector of unit activity patterns Common analyses of neural representations: Encoding models (relating activity to task features) drawing of an arrow from a trace saying [on_____on____] to a neuron and spike train. Comparing models via neural predictivity: comparing two neural networks by their R^2 to mouse brain activity. RSA: assessing brain-brain or model-brain correspondence using representational dissimilarity matrices

In neuroscience, we often try to understand systems by analyzing their representations — using tools like regression or RSA. But are these analyses biased towards discovering a subset of what a system represents? If you're interested in this question, check out our new commentary! Thread:

8 months ago 171 53 5 1

Super excited to have the #InfoCog workshop this year at #CogSci2025! Join us in SF for an exciting lineup of speakers and panelists, and check out the workshop's website for more info and detailed scheduled
sites.google.com/view/infocog...

8 months ago 27 7 1 2

Submit your latest and greatest papers to the hottest workshop on the block---on cognitive interpretability! 🔥

9 months ago 8 1 0 0

Home First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)

Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4

9 months ago 58 19 1 3

A bias for simplicity by itself does not guarantee good generalization (see the No Free Lunch Theorems). So an inductive bias is only good to the extent that it reflects structure in the data. Is the world simple? The success of deep nets (with their intrinsic Occam's razor) would suggest yes(?)

9 months ago 6 1 2 0

Hi thanks for the comment! I'm not too familiar with the robot-learning literature but would love to learn more about it!

9 months ago 0 0 0 0

Posts by Daniel Wurgaft