New preprint out today (osf.io/preprints/ps...). We tested whether AI agents are actually infiltrating online surveys.
Spoiler alert: they aren't
Thread 🧵
[1/9]
Posts by Paul Clist
🧵1/ Our first meta-science paper (with 350+ coauthors) is published today in Nature. It presents one of the largest-ever reproducibility projects in economics & political science.
Here’s what we found 👇
This was fun - thanks to all the participants for great comments :)
Can AI detect the ten errors in Moretti 2021? I did a test of GPT5.2 vs GPT5.4 vs refine.
Takeaway: current reasoning LLMs are useful, with room for improvement.
1/
🚨Replication alert🚨
I'm pleased to announce that my replication of Moretti (2021) is now accepted as a comment at AER.
I find ten issues in the paper. My comment focuses on two major problems; in the appendix, I document eight (relatively) minor problems.
1/
Super excited about this...
I'm hiring a video editor to help bring economics to a mass audience (yes, really). If you're thoughtful, creative, and want to make complex ideas accessible, I'd love to see your work.
Apply here: docs.google.com/forms/d/e/1F...
Or share with your most amazing mates!
It's ironic to see a discipline care **so much** about unbiasedness (causal inference!) at the level of a single test but then have a research production system and culture that is basically a ferocious bias generation machine. This is not good.
Abstract Al assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of Al. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that Al-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation - particularly in safety-critical domains.
‘Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition… We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.’
arxiv.org/pdf/2601.20245
Who leaked this Number 10 discussion to Jeffrey Epstein? And are there consequences for the leaker?
It’s an internal discussion re. getting markets moving in the aftermath of the financial crisis. No doubt of great interest to Epstein and his financial market clients.
🆕 There's a growing body of evidence on the common misperceptions people have about the world.
And it turns out that, across a bunch of different settings, correcting those misperceptions seems to be a very cheap way of improving society.
Here are some examples: voxdev.org/topic/common...
New evidence from Africa shows that aid reduces conflict when projects are well managed, but increases violence when management and monitoring are weak.
Read today's article to learn more:
The Weiss Fund has a great new initiative for development economists on the PhD job market to support those taking up research positions in LMICs, offering supplementary income + research funds. Please share!
Check out our new VoxDevLit on International Migration! Thanks to co-editors @catiabatista.bsky.social, @econgaurav.bsky.social, @dmckenzie.bsky.social, @mushfiq-econ.bsky.social, & Caroline Theoharides!
We look to working together on this "living literature review" in the years to come...
These economists are unsurpassed in research on migration & development. Global authorities.
Their new resource at @voxdev.bsky.social is a gift that will keep on giving —>
lolsob as I try for the 100th time to convince a biologist that differences in statistical significance are not significant
For folks at the AEA meetings...
Come hear us debate what we do and don't know about the impact of foreign aid.
Kicking off 2026 w/ a list of my favorite published dev papers from 2025 #econtwitter #econsky! (Favorite, not best, because best is hard to define - but I loved these papers + learned a lot from them. by "2025", I mean in a journal volume last year)
New in JCRE: The Common Problem of Bad Controls in Tests of the Linguistic Savings Hypothesis. A Comment on Ayres et al. (PNAS, 2023) and related literature by Paul Clist @paulclist.bsky.social jcr-econ.org/the-common-p...
Many thanks to my excellent coauthor www.yingyihong.org
As an aside, one of the three papers that spread the idea is Ariely & Gino (12), which I don't think has been retracted, but we discuss some 'interesting' data patterns in our appendix, most notably identical distributions in two treatments
At the suggestion of a referee we test mixture models to see if anyone is following JD. We don't find significant evidence they are. Models without JD offer better fit.
So whilst JD is a neat theory, there isn't anything special about counterfactuals. Standard lying models work quite well.
we find that whilst it is a neat theory, it doesn't seem to be a good explanation of what's going on. We test this by 1) running a placebo test, where JD's predictions fit behaviour *really well* when they shouldn't, and 2) asking for the second roll and testing a corollary of JD. It doesn't pass.
Dice games are a popular way of measuring lying and cheating. There's a neat theory, called Justified Dishonesty, where people that observe counterfactuals 'swap' rolls, as they can cheat but feel honest.
We explore that idea here:
www.sciencedirect.com/science/arti...
Text reads: About synthetic panels Recruiting the right participants for a study can be difficult. You may not get the exact demographics you need, and the shorter the deadline, the less sure you can be that everyone will answer on time. One possible solution can be to use synthetic panels. Synthetic panels are powered by a first party proprietary AI model developed here at Qualtrics. Our synthetic panel is trained on thousands of responses from a variety of demographic backgrounds in order to more accurately predict how certain populations would respond to a survey. Our synthetic panel is based on the United States General Population, and is only available in English. This panel comes with ready-made quotas and target breakouts in order to represent your chosen population and make it easy to launch your survey right away.
Text reads: Question-writing best practices To get the most reliable and actionable results from synthetic audiences, consider these question-writing best practices: Ask forward-looking and attitudinal questions. Synthetic panels perform best with perceptions, preferences, and intent-based questions. For example, “How likely are you to try…?” Synthetic panels are less applicable for studies on past behaviors, detailed recall, brand recall, or awareness questions. For example, “When did you last visit…?”
Text reads: Discussion The current study aimed to conduct a meta-analysis of the TPB when applied to health behaviours which addressed the limitations of previous reviews by including only prospective tests of behaviour, applying RE meta-analytic procedures, correcting correlations for sampling and measurement error, and hierarchically analysing the effect of behaviour type and sample and methodological moderators. Some 237 tests were identified which examined relations amongst model components. Overall the analysis indicated that the TPB could explain 19.3% of the variance in behaviour and 44.3% of the variance in intention across studies. This level of prediction of behaviour is slightly lower than that of previous meta-analytic reviews which have found between 27% (Armitage & Conner, 2001; Hagger et al., 2002) and 36% (Trafimow et al., 2002) of the variance in behaviour to be explained by intention and PBC.
Did you know that from tomorrow, Qualtrics is offering synthetic panels (AI-generated participants)?
Follow me down a rabbit hole I'm calling "doing science is tough and I'm so busy, can't we just make up participants?"
This is really good news for thousands of students - ERASMUS is a fantastic programme that UK students should have kept all along. ERASMUS provides opportunities that without funding many students could never afford. It can only be a net positive.
Free the Best Buys!
www.cgdev.org/blog/fcdos-b...
Simulating from and checking a model in Stan: It’s so easy in Stan Playground–it just runs on your browser!
statmodeling.stat.columbia.edu/2025/12/15/s...
Experimental economics now has a substantial track record
#econsky #academicsky
marketdesigner.blogspot.com/2025/12/nott...
Wired: two article proofs to check, received on the same afternoon
Tried: In the week before Christmas, whilst packing up my office with a cold