🚨New preprint and our results are rather concerning..
We find the "boiling frog" equivalent of AI use. Using large-scale RCTs, we provide *casual* evidence that AI assistance reduces persistence and hurts independent performance.
And these effects emerge after just 10–15 minutes of AI use!
1/
Posts by Elizabeth Mieczkowski
Are the laws of thought woven from three golden threads?
Tom @cocoscilab.bsky.social and I discuss some of the history and themes in logic, probability, and neural nets, in his new book The Laws of Thought.
braininspired.co/podcast/233/
Sycophantic AI distorts reality by returning responses that are biased to reinforce existing beliefs.
"sycophantic AI distorts belief, manufacturing certainty where there should be doubt."
Unbiased sampling produces discovery rates 5X higher! arxiv.org/pdf/2602.14270
11/ Paper and code: arxiv.org/abs/2603.12229. Thanks to my amazing collaborators: Katie Collins, @sucholutsky.bsky.social, @natvelali.bsky.social, @cocoscilab.bsky.social !!
10/ The stakes are high 📈Poorly coordinated LLM teams won't just underperform; they'll compound errors and waste resources at scale. We hope this work helps move LLM team design toward systems that are not just more capable, but more predictable, efficient, and responsible.
9/ The broader point: not all of the challenges facing LLM teams today are new or mysterious 🔍 Problems like scalability, consistency, stragglers, and fault tolerance are well-characterized with well-studied solutions.
8/ We also document an under-appreciated cost tradeoff: token usage often scales faster than speedup. A team that is faster in wall-clock time may still be less efficient once compute costs are factored in, especially when agents communicate with each other.
7/ However, decentralized teams exhibited one key advantage: robustness to stragglers. When task assignments were flexible, faster agents could dynamically take on the work of slower ones, a natural analogue to replication strategies like MapReduce.
6/ Second prediction: centralized and decentralized architectures have distinct failure modes. Decentralized teams showed significantly more consistency conflicts, test failures, and communication overhead, exactly as distributed systems anticipate.
5/ First prediction: Amdahl's Law. In distributed computing, speedup from parallelization is bounded by the serial fraction of a task. We find the same holds empirically for LLM teams across all three models, although some come much closer to this bound than others.
4/ To test these predictions, we had teams of 1-5 LLM agents (Claude Sonnet 4.6, Gemini 3 Flash, or GPT-5.2) collaborate on a suite of coding problems with varying task and team structures. 🤖💻
3/ LLM teams share four key properties with distributed systems: independence, concurrency, communication, and fallibility 🔗This formal correspondence lets us borrow decades of theory and generate testable predictions about LLM team behavior.
2/ The promise that LLM teams could extend capabilities beyond any single model is compelling 🤝 But the reality is messier: performance is inconsistent, costly, and unpredictable. We need a formal framework for design and deployment 💡Luckily, distributed computing has one.
🚨New preprint! LLM teams are being deployed at scale, yet we lack the tools to predict when they’ll succeed, fail, or how to design them. Distributed computing faced the exact same questions and figured out how to answer them. We show those insights apply directly to LLMs 🧵👇
Nice!
Wow thank you so much! 😊
The visual world is composed of objects, and those objects are composed of features. But do VLMs exploit this compositional structure when processing multi-object scenes? In our 🆒🆕 #ICLR2026 paper, we find they do – via emergent symbolic mechanisms for visual binding. 🧵👇
Are you a grad student who wants to give a talk at Princeton’s psychology department (in-person or on Zoom)?
Nominate yourself or someone you know: forms.gle/WN2ybYMuZiW3...
Priority given to non-Ivy and URM students. International applicants welcome.
Deadline is this Friday (Feb 6)!
Excited to announce a new book telling the story of mathematical approaches to studying the mind, from the origins of cognitive science to modern AI! The Laws of Thought will be published in February and is available for pre-order now.
I'm looking for two PhD student for Fall 2026. Both on multi-agent reinforcement learning (MARL).
- Theory of MARL: experience with theory and/or MARL
-Formal methods for MARL: experience with formal methods or MARL (interest in learning the other)
www.khoury.northeastern.edu/programs/com...
CONGRATS so so well-deserved! 🥳🙌
Honored and excited to share that I am the winner of Nomis & Science Young Explorer Award!!
Also thrilled to share that my article describing my research is out now in @science.org today!
The normalization of (almost) everything www.science.org/doi/10.1126/...
1/
Do AI agents ask good questions? We built “Collaborative Battleship” to find out—and discovered that weaker LMs + Bayesian inference can beat GPT-5 at 1% of the cost.
Paper, code & demos: gabegrand.github.io/battleship
Here's what we learned about building rational information-seeking agents... 🧵🔽
Honored to contribute a Journal Club piece to @natrevpsychol.nature.com!
I explore Robert White's seminal "competence motivation" framework (1959) and why it remains relevant over 60 years later—from why toddlers insist on doing things themselves to designing intrinsically motivated AI. 🤖
I'm recruiting grad students!! 🎓
The CoDec Lab @ NYU (codec-lab.github.io) is looking for PhD students (Fall 2026) interested in computational approaches to social cognition & problem solving 🧠
Applications through Psych (tinyurl.com/nyucp) are due Dec 1. Reach out with Qs & please repost! 🙏
Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")?
Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior!
shorturl.at/siUYI%F0%9F%...
Timely reminder that 'associative' language leads lay people to confuse correlation and causation, as @tomerullman.bsky.social and I showed a few years ago.
journals.plos.org/plosone/arti...
Snapshot from the BBC:
www.bbc.com/news/article...
Can large language models stand in for human participants?
Many social scientists seem to think so, and are already using "silicon samples" in research.
One problem: depending on the analytic decisions made, you can basically get these samples to show any effect you want.
THREAD 🧵
Our new lab for Human & Machine Intelligence is officially open at Princeton University!
Consider applying for a PhD or Postdoc position, either through Computer Science or Psychology. You can register interest on our new website lake-lab.github.io (1/2)