Elizabeth Mieczkowski (@emiecz) Bsky

🚨New preprint and our results are rather concerning..

We find the "boiling frog" equivalent of AI use. Using large-scale RCTs, we provide *casual* evidence that AI assistance reduces persistence and hurts independent performance.

And these effects emerge after just 10–15 minutes of AI use!

1/

2 weeks ago 1513 680 27 74

Are the laws of thought woven from three golden threads?

Tom @cocoscilab.bsky.social and I discuss some of the history and themes in logic, probability, and neural nets, in his new book The Laws of Thought.

braininspired.co/podcast/233/

1 month ago 7 3 0 0

Sycophantic AI distorts reality by returning responses that are biased to reinforce existing beliefs.

"sycophantic AI distorts belief, manufacturing certainty where there should be doubt."

Unbiased sampling produces discovery rates 5X higher! arxiv.org/pdf/2602.14270

1 month ago 29 14 1 2

Language Model Teams as Distributed Systems Large language models (LLMs) are growing increasingly capable, prompting recent interest in LLM teams. Yet, despite increased deployment of LLM teams at scale, we lack a principled framework for addre...

11/ Paper and code: arxiv.org/abs/2603.12229. Thanks to my amazing collaborators: Katie Collins, @sucholutsky.bsky.social, @natvelali.bsky.social, @cocoscilab.bsky.social !!

1 month ago 3 0 0 1

10/ The stakes are high 📈Poorly coordinated LLM teams won't just underperform; they'll compound errors and waste resources at scale. We hope this work helps move LLM team design toward systems that are not just more capable, but more predictable, efficient, and responsible.

1 month ago 1 0 1 0

9/ The broader point: not all of the challenges facing LLM teams today are new or mysterious 🔍 Problems like scalability, consistency, stragglers, and fault tolerance are well-characterized with well-studied solutions.

1 month ago 3 0 1 0

8/ We also document an under-appreciated cost tradeoff: token usage often scales faster than speedup. A team that is faster in wall-clock time may still be less efficient once compute costs are factored in, especially when agents communicate with each other.

1 month ago 3 0 1 0

7/ However, decentralized teams exhibited one key advantage: robustness to stragglers. When task assignments were flexible, faster agents could dynamically take on the work of slower ones, a natural analogue to replication strategies like MapReduce.

1 month ago 2 0 1 0

6/ Second prediction: centralized and decentralized architectures have distinct failure modes. Decentralized teams showed significantly more consistency conflicts, test failures, and communication overhead, exactly as distributed systems anticipate.

1 month ago 2 0 2 0

5/ First prediction: Amdahl's Law. In distributed computing, speedup from parallelization is bounded by the serial fraction of a task. We find the same holds empirically for LLM teams across all three models, although some come much closer to this bound than others.

1 month ago 2 0 1 0

4/ To test these predictions, we had teams of 1-5 LLM agents (Claude Sonnet 4.6, Gemini 3 Flash, or GPT-5.2) collaborate on a suite of coding problems with varying task and team structures. 🤖💻

1 month ago 2 0 1 0

3/ LLM teams share four key properties with distributed systems: independence, concurrency, communication, and fallibility 🔗This formal correspondence lets us borrow decades of theory and generate testable predictions about LLM team behavior.

1 month ago 2 0 1 0

2/ The promise that LLM teams could extend capabilities beyond any single model is compelling 🤝 But the reality is messier: performance is inconsistent, costly, and unpredictable. We need a formal framework for design and deployment 💡Luckily, distributed computing has one.

1 month ago 2 0 1 0

🚨New preprint! LLM teams are being deployed at scale, yet we lack the tools to predict when they’ll succeed, fail, or how to design them. Distributed computing faced the exact same questions and figured out how to answer them. We show those insights apply directly to LLMs 🧵👇

1 month ago 30 3 1 1

Nice!

1 month ago 1 0 0 0

We Automated RL Environment Engineering for $10 RL environment simulation eats 50-90% of training wall-clock for specialist RL policies. Coding agents can translate them automatically with no sim-to-sim gap.

open.substack.com/pub/sethkart...

1 month ago 6 4 2 0

Wow thank you so much! 😊

2 months ago 1 0 0 0

The visual world is composed of objects, and those objects are composed of features. But do VLMs exploit this compositional structure when processing multi-object scenes? In our 🆒🆕 #ICLR2026 paper, we find they do – via emergent symbolic mechanisms for visual binding. 🧵👇

2 months ago 83 26 1 3

Emerging Scholars in Psychological Science Speaker Nominations Nominate yourself or another late-stage PhD student to speak at Princeton's Department of Psychology this academic semester (Spring 2026). The Emerging Scholars in Psychological Science (ESPS) talk ...

Are you a grad student who wants to give a talk at Princeton’s psychology department (in-person or on Zoom)?

Nominate yourself or someone you know: forms.gle/WN2ybYMuZiW3...

Priority given to non-Ivy and URM students. International applicants welcome.

Deadline is this Friday (Feb 6)!

2 months ago 9 10 0 0

Excited to announce a new book telling the story of mathematical approaches to studying the mind, from the origins of cognitive science to modern AI! The Laws of Thought will be published in February and is available for pre-order now.

4 months ago 167 39 2 7

PhD in Computer Science - Khoury College of Computer Sciences The PhD in Computer Science program will prepare you with advanced knowledge, industry opportunities, and research experience to be a leader in the field.

I'm looking for two PhD student for Fall 2026. Both on multi-agent reinforcement learning (MARL).

- Theory of MARL: experience with theory and/or MARL

-Formal methods for MARL: experience with formal methods or MARL (interest in learning the other)

www.khoury.northeastern.edu/programs/com...

5 months ago 6 4 1 0

CONGRATS so so well-deserved! 🥳🙌

5 months ago 2 0 0 0

The normalization of (almost) everything: Our minds can get used to anything, and even crises start feeling normal Our minds can get used to anything, and even crises start feeling normal

Honored and excited to share that I am the winner of Nomis & Science Young Explorer Award!!

Also thrilled to share that my article describing my research is out now in @science.org today!

The normalization of (almost) everything www.science.org/doi/10.1126/...

1/

5 months ago 85 21 3 2

Do AI agents ask good questions? We built “Collaborative Battleship” to find out—and discovered that weaker LMs + Bayesian inference can beat GPT-5 at 1% of the cost.

Paper, code & demos: gabegrand.github.io/battleship

Here's what we learned about building rational information-seeking agents... 🧵🔽

5 months ago 25 11 1 2

Honored to contribute a Journal Club piece to @natrevpsychol.nature.com!

I explore Robert White's seminal "competence motivation" framework (1959) and why it remains relevant over 60 years later—from why toddlers insist on doing things themselves to designing intrinsically motivated AI. 🤖

6 months ago 12 4 1 1

codec lab

I'm recruiting grad students!! 🎓

The CoDec Lab @ NYU (codec-lab.github.io) is looking for PhD students (Fall 2026) interested in computational approaches to social cognition & problem solving 🧠

Applications through Psych (tinyurl.com/nyucp) are due Dec 1. Reach out with Qs & please repost! 🙏

6 months ago 60 48 3 2

Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")?

Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior!

shorturl.at/siUYI%F0%9F%...

6 months ago 38 14 3 5

Timely reminder that 'associative' language leads lay people to confuse correlation and causation, as @tomerullman.bsky.social and I showed a few years ago.
journals.plos.org/plosone/arti...

Snapshot from the BBC:
www.bbc.com/news/article...

6 months ago 65 17 2 3

The threat of analytic flexibility in using large language models to simulate human data: A call to attention Social scientists are now using large language models to create "silicon samples" - synthetic datasets intended to stand in for human respondents, aimed at revolutionising human subjects research. How...

Can large language models stand in for human participants?
Many social scientists seem to think so, and are already using "silicon samples" in research.

One problem: depending on the analytic decisions made, you can basically get these samples to show any effect you want.

THREAD 🧵

7 months ago 343 159 12 61

Our new lab for Human & Machine Intelligence is officially open at Princeton University!

Consider applying for a PhD or Postdoc position, either through Computer Science or Psychology. You can register interest on our new website lake-lab.github.io (1/2)

7 months ago 54 14 1 0

Posts by Elizabeth Mieczkowski