We are inviting applications for a two-year postdoctoral position in a collaborative meta-science project on the effectiveness of data and code sharing policies in research-performing organizations. www.tue.nl/en/working-a...
Posts by Jamie Cummins
spend less on Nick Fuentes Superchats
All the more reason to provide strong methodological critiques of these methods _before_ they creep into the academic research literature!
Thanks so much Eiko!
10/
If the whole selling point of silicon samples is that they can be faster, cheaper, and more efficient to run than human data collection, but silicon samples require specific human data for their validation, one must ask: are silicon samples actually saving us any time or resources at all?
9/
As the preprint says at the very end: silicon samples are not magic. They are simply a methodology, and should be developed and evaluated as we would any methodology. But (i) there are often alternative approaches available, and (ii) it tasks time and resources to valid this method.
8/
The revised paper offers some practical recommendations for how one might develop better silicon sampling methodologies. Specifically, I describe how a "configuration-confirmation" split approach could, in some cases, lead to better performances on specific, pre-specified features of evaluation.
7/
In other words: not only do silicon samples choices seem to not generalise across different data features of evaluation (i.e., Study 1's finding), but they also appear not to generalise in performance on similar tasks across different substantive domains.
6/
The best all-item configuration was GPT-3.5 Turbo, temperature = 1.5, narrative-profile prompt, full demographic info. It produced r = .84.
Interestingly, that configuration in Study 1 performed quite poorly in Study 1 for recovering associations between scales.
5/
Are the findings of Argyle et al. influenced by analytic flexibility? Re-running their silicon sampling approach using different choices for model, temperature/reasoning effort, prompt content, and demographic information provided, I say: yes.
4/
I looked at this in the context of Argyle et al. (2023)'s third study, which examined whether silicon samples can effectively recover patterns of association among political/demographic variables using the ANES 2016 Time Series Study data.
3/
I originally demonstrated how flexibility could affect estimates in a specific case study. But reviewers asked: how does flexibility affect a published result?
2/
The main new additions:
- A revised title!
- A new second study, that demonstrates the impact of analytic flexibility on an already-published, highly cited (>1300 times) silicon sample study (Argyle et al. 2023 Study 3).
- A new exploration of how we might use these samples more robustly.
1/
"Silicon samples" are becoming more and more common in research and polling.
One problem: depending on the analytic decisions made, you can basically get these samples to show any effect you want.
The updated version of this preprint is now online!
THREAD🧵
arxiv.org/abs/2509.13397
Plus, the data that comes from AI is "weird in ways we don't understand." (Bisbee) Research from metascientist Jamie Cummins finds that "....a very small number of decisions can dramatically change the correspondence between silicon samples and human data."
Since last December, I have been obsessed with how AI has influenced polling and survey research, here is the tl;dr
1- Some researchers are using "silicon respondents" (AI bots) to simulate human responses
2- AI bots have gotten so advanced they can evade detection in some online surveys
Participants were explicitly informed that there was no penalty for providing wrong answers, their payment didn’t depend on how many questions they solve correctly, and they were requested to do the task to the best of their abilities
lmfao
I’m sorry but this is an absolutely insane conclusion for the authors to draw given what they actually did
I’m hiring a PhD student!
The candidate will work alongside @zefreeman.bsky.social, who is joining our research group as postdoc.
jobs.unibe.ch/job-vacancie...
me on the TED stage
Me giving the TED talk
Me, still talkin'
I gave a TED talk today!
New post on The 100% CI: Science needs downvotes.
www.the100.ci/2026/04/13/s...
In which I make the case that grant funders should add funding lines that include a module for bug bounties.
New blog posts, new mistakes, new (to me) kinds of error! Recently, @schmukle.bsky.social was confused about a plot in one of my papers. This usually means I messed up bad. After a couple-month-long investigation, I can now report: I messed up, but fortunately it was fairly inconsequential.
You know, I really feel that living in Switzerland has increased my German-ness, not decreased it. Clearly baseline German-ness is an important factor in Switzerland's effect!
The ERROR project recruits independent experts to recheck social sciences papers’ data, statistics, methodology, code; now the project plans to publish the reviews in a new peer-reviewed journal.
science.org/content/arti...
@dalmeet.bsky.social @science.org
#reproducibility
Yeah, OK, that reads.
Super happy to be giving a keynote talk at this timely and topical conference. I will talk about our recent efforts to build and evaluate workflows to help compare study registrations and papers in psychology, medicine, economics, and preclinical trials.
Come join us in Eindhoven!
Hope to see you at the Conference on Transparency, Technology and AI in Peer Review on June 5, 2026, which is organized in collaboration with the Center for Humans and Technology at Eindhoven University of Technology.
Register at: meta-eindhoven.github.io/events/
Fun facts
- The original study reported 23 IQ point gains
- Jordan Peterson tweeted about it (sort of)
- @jamiecummins.bsky.social new preregistered RCT shows null effects
A failure to find effects of relational operant training on scholastic aptitude of school children: A randomised controlled trial: https://osf.io/cauvz