Advertisement · 728 × 90

Posts by Jamie Cummins

Preview
Postdoc In Meta-science Personal type: Scientific staff

We are inviting applications for a two-year postdoctoral position in a collaborative meta-science project on the effectiveness of data and code sharing policies in research-performing organizations. www.tue.nl/en/working-a...

5 days ago 52 66 1 0

spend less on Nick Fuentes Superchats

1 day ago 0 0 1 0

All the more reason to provide strong methodological critiques of these methods _before_ they creep into the academic research literature!

1 day ago 3 0 1 0

Thanks so much Eiko!

1 day ago 1 0 0 0
Preview
The threat of analytic flexibility in using large language models to simulate human data Social scientists are now using large language models to create "silicon samples": synthetic datasets intended to stand in for human respondents. However, producing these samples requires many analyti...

11/
That's a wrap! Questions and comments welcome as always.
arxiv.org/abs/2509.13397

1 day ago 3 0 0 0

10/
If the whole selling point of silicon samples is that they can be faster, cheaper, and more efficient to run than human data collection, but silicon samples require specific human data for their validation, one must ask: are silicon samples actually saving us any time or resources at all?

1 day ago 6 0 2 0

9/
As the preprint says at the very end: silicon samples are not magic. They are simply a methodology, and should be developed and evaluated as we would any methodology. But (i) there are often alternative approaches available, and (ii) it tasks time and resources to valid this method.

1 day ago 2 0 1 0

8/
The revised paper offers some practical recommendations for how one might develop better silicon sampling methodologies. Specifically, I describe how a "configuration-confirmation" split approach could, in some cases, lead to better performances on specific, pre-specified features of evaluation.

1 day ago 1 0 1 0

7/
In other words: not only do silicon samples choices seem to not generalise across different data features of evaluation (i.e., Study 1's finding), but they also appear not to generalise in performance on similar tasks across different substantive domains.

1 day ago 4 1 1 0
Advertisement

6/
The best all-item configuration was GPT-3.5 Turbo, temperature = 1.5, narrative-profile prompt, full demographic info. It produced r = .84.

Interestingly, that configuration in Study 1 performed quite poorly in Study 1 for recovering associations between scales.

1 day ago 4 1 2 0
Post image

5/
Are the findings of Argyle et al. influenced by analytic flexibility? Re-running their silicon sampling approach using different choices for model, temperature/reasoning effort, prompt content, and demographic information provided, I say: yes.

1 day ago 2 0 1 0

4/
I looked at this in the context of Argyle et al. (2023)'s third study, which examined whether silicon samples can effectively recover patterns of association among political/demographic variables using the ANES 2016 Time Series Study data.

1 day ago 1 0 1 0

3/
I originally demonstrated how flexibility could affect estimates in a specific case study. But reviewers asked: how does flexibility affect a published result?

1 day ago 1 0 1 0

2/
The main new additions:

- A revised title!
- A new second study, that demonstrates the impact of analytic flexibility on an already-published, highly cited (>1300 times) silicon sample study (Argyle et al. 2023 Study 3).
- A new exploration of how we might use these samples more robustly.

1 day ago 4 0 1 0

1/
"Silicon samples" are becoming more and more common in research and polling.

One problem: depending on the analytic decisions made, you can basically get these samples to show any effect you want.

The updated version of this preprint is now online!

THREAD🧵

arxiv.org/abs/2509.13397

1 day ago 84 42 5 4
Preview
The threat of analytic flexibility in using large language models to simulate human data: A call to attention Social scientists are now using large language models to create "silicon samples" - synthetic datasets intended to stand in for human respondents, aimed at revolutionising human subjects research. How...

Plus, the data that comes from AI is "weird in ways we don't understand." (Bisbee) Research from metascientist Jamie Cummins finds that "....a very small number of decisions can dramatically change the correspondence between silicon samples and human data."

3 weeks ago 2 2 1 0
Advertisement
Preview
Polling has an AI respondent problem Democracy doesn't know what's coming.

Since last December, I have been obsessed with how AI has influenced polling and survey research, here is the tl;dr

1- Some researchers are using "silicon respondents" (AI bots) to simulate human responses
2- AI bots have gotten so advanced they can evade detection in some online surveys

3 weeks ago 52 21 2 2
Participants were explicitly informed that there was no penalty for providing wrong answers, their payment didn’t depend on how many questions they solve correctly, and they were requested to do the task to the best of their abilities

Participants were explicitly informed that there was no penalty for providing wrong answers, their payment didn’t depend on how many questions they solve correctly, and they were requested to do the task to the best of their abilities

lmfao

1 day ago 37 4 2 0

I’m sorry but this is an absolutely insane conclusion for the authors to draw given what they actually did

1 day ago 114 28 12 3
Preview
PhD Student in Meta-Science and Clinical Psychology - Universität Bern Universität Bern is looking for PhD Student in Meta-Science and Clinical Psychology

I’m hiring a PhD student!

The candidate will work alongside @zefreeman.bsky.social, who is joining our research group as postdoc.

jobs.unibe.ch/job-vacancie...

3 days ago 67 55 3 8
me on the TED stage

me on the TED stage

Me giving the TED talk

Me giving the TED talk

Me, still talkin'

Me, still talkin'

I gave a TED talk today!

6 days ago 281 16 18 2
Preview
Science needs downvotes A bug bounty module in grants would give criticism a leg up The Soviet Union was good at producing shoes. Factories made 800 million pairs a year, twice as many as Italy, three times as many as the...

New post on The 100% CI: Science needs downvotes.
www.the100.ci/2026/04/13/s...
In which I make the case that grant funders should add funding lines that include a module for bug bounties.

1 week ago 57 14 4 5

New blog posts, new mistakes, new (to me) kinds of error! Recently, @schmukle.bsky.social was confused about a plot in one of my papers. This usually means I messed up bad. After a couple-month-long investigation, I can now report: I messed up, but fortunately it was fairly inconsequential.

1 week ago 22 2 1 1

You know, I really feel that living in Switzerland has increased my German-ness, not decreased it. Clearly baseline German-ness is an important factor in Switzerland's effect!

1 week ago 1 0 1 0
Preview
Offering scientists cash to spot errors in published papers doesn’t work The ERROR project tried enticing reviewers with payments. Now, it’s launching a journal—and promising papers as rewards

The ERROR project recruits independent experts to recheck social sciences papers’ data, statistics, methodology, code; now the project plans to publish the reviews in a new peer-reviewed journal.
science.org/content/arti...
@dalmeet.bsky.social @science.org
#reproducibility

1 week ago 20 16 0 0

Yeah, OK, that reads.

1 week ago 10 0 2 0
Advertisement

Super happy to be giving a keynote talk at this timely and topical conference. I will talk about our recent efforts to build and evaluate workflows to help compare study registrations and papers in psychology, medicine, economics, and preclinical trials.

Come join us in Eindhoven!

2 weeks ago 17 3 0 0
Preview
META/e Events ## META/e Conference on Transparency, Technology and AI in Peer Review **Eindhoven University of Technology, 5 June 2026** Maximum attendees: 100 **Organizers**: [Vlasta Sikimić](https://vlastasikimic.com/),...

Hope to see you at the Conference on Transparency, Technology and AI in Peer Review on June 5, 2026, which is organized in collaboration with the Center for Humans and Technology at Eindhoven University of Technology.

Register at: meta-eindhoven.github.io/events/

2 weeks ago 9 7 0 2

Fun facts

- The original study reported 23 IQ point gains
- Jordan Peterson tweeted about it (sort of)
- @jamiecummins.bsky.social new preregistered RCT shows null effects

2 weeks ago 27 3 0 0

A failure to find effects of relational operant training on scholastic aptitude of school children: A randomised controlled trial: https://osf.io/cauvz

2 weeks ago 5 5 0 1