We have a new preprint that underscores some key claims here: even if one *can* design an agent that gets through a survey fine, it doesn't follow that such agents are undetectable or common. We find that they are far from common! Preprint link in thread👇
Posts by Andrew Gordon
If this sounds like the research career opportunity you've been waiting for, I'd love to hear from you. Apply via the link or DM me 👇 [4/4]
US applicants: job-boards.eu.greenhouse.io/prolific/job...
UK applicants: job-boards.eu.greenhouse.io/prolific/job...
We're looking for someone with:
• A PhD in a relevant quant field
• Experience in online research methods — sampling, data quality, synthetic data
• A track record of publications or public outputs
• US east coast preferred, but open to UK also
• Postdoc/industry experience preferred
[3/4]
The hire will own a dedicated research portfolio, design an agenda, publish in high-impact journals, collaborate with leading researchers, and present at conferences.
This is an IC role and comes with real autonomy to research the most interesting questions concerning online research! [2/4]
Job ad: I'm hiring for a Research scientist to join my team at @joinprolific.bsky.social
If you've ever wondered who's working on the hard questions in online research — data quality, sampling methodology, the effect of AI on how research gets done — this is that job. [1/4]
@kwcollins.bsky.social the costs are composited together at the same level of platform type yes
@mrandall.bsky.social correct!
honestly, I have no idea
still see it mentioned at conferences though
The takeaway: LLM agents are not infiltrating the platforms researchers actually use. Outside of one platform already notorious for pre-LLM bot problems, they just aren't there.
The human data quality problem on the other hand is large, systematic, and fixable through platform choice.
[9/9]
And the cost finding:
At a 90% quality threshold, Direct panels cost $8.26 per quality respondent. Marketplace: $74.43.
That is not a typo. "Cheap" platforms are often the most expensive data you can buy.
[8/9]
The bigger story is human data quality.
Platform type was a far more consequential predictor than anything agent-related. Direct panels outperformed hybrid, which outperformed marketplace, consistently across nearly all measures.
[7/9]
And the MTurk detections didn't look like LLM agents. Poor writing quality, fast completions, clustered arrivals. Classic bot behavior.
When we ran real LLM agents through the same survey, they outperformed average humans and vastly outperformed the flagged responses.
[6/9]
So are AI agents infiltrating surveys? Not really.
Meaningful detections were almost exclusively on MTurk (11-16%).
Every other platform: at or below 1% for our primary detection flag
[5/9]
Then we recruited 5,200 respondents across 10 platforms spanning direct (Prolific, CloudResearch Connect, Verasight), hybrid (Dynata, Prodege), and marketplace (Cint, Qualtrics, Purespectrum, Prime Panels), and MTurk.
Same survey for everyone. 7 behavioural quality measures. Full metadata.
[4/9]
First we validated our detection methods against real agents (Claude, ChatGPT, Gemini, Perplexity, plus a custom white-hat agent) vs real humans.
Primary method: 100% sensitivity, 100% specificity.
Secondary behavioural battery: 92% sensitivity, 99.2% specificity.
[3/9]
There's been a lot of alarm about LLM agents polluting survey samples. Capability demos are impressive. But capability is not the same as deployment within an ecosystem.
We wanted to know what's actually happening in the platforms researchers use.
[2/9]
New preprint out today (osf.io/preprints/ps...). We tested whether AI agents are actually infiltrating online surveys.
Spoiler alert: they aren't
Thread 🧵
[1/9]
Starting today, if an AI agent is detected in your Prolific study, we’ll give you twice the cost of that non-human participant back.
You pay for human data. You expect human data. We’re backing that with our 100% Human Guarantee.
Learn more: www.prolific.com/100-human-gu... #AcademicSky #Research
Fantastic example of researchers working together (and the utility of rebuttals to published work). I think we all agree that this is an area we need to invest time in, but we also need to be very careful that conclusions/interpretations are warranted from the data we collect.
@drbarner.bsky.social you might want to read this bsky.app/profile/ache...
Your letter very clearly reads as saying that there are bots, and you use that to question the integrity of online sampling... but you agree with the notion that your measures are not sufficiently establishing to make such a claim. Do you not think an amendment to your published letter is required?
Recently, van der Stigchel and colleagues posted a provocative commentary suggesting that we should be wary of bots in online behavioral data collection (🧵by @cstrauch.bsky.social here: bsky.app/profile/cstr...). But should we? Here is my response letter osf.io/preprints/ps.... 1/5
On Thursday I'll be taking part in a roundtable “How real is the LLM threat to online research in academia?” for @joinprolific.bsky.social alongside @davmicrot.bsky.social, Michael Nicholas Stagnaro, and Raluca Rilla.
Sign up here: lnkd.in/em8-MpjN
Interval strongly predicted retention: 1-week → 80% completed all sessions; 4-week → 50%.
Payment had no significant effect.
Participants higher in routine showed better retention; those higher in automaticity showed worse
Read the paper here: osf.io/preprints/ps...
What drives retention in online longitudinal research? We conducted an experiment (N=1,798) on @joinprolific.bsky.social, orthogonally manipulating payment rate (£6–£9/hr + bonus) and session interval (1, 2, or 4 weeks) across five sessions. The findings challenge some common assumptions 👇
Hey @rory-stewart.bsky.social @alastaircampbell2.bsky.social @therestpolitics.bsky.social I asked your favourite question to 1,936 US Adults "Who do you believe is the biggest threat to global order and security?"
33.8% of Americans rate the US as the biggest threat.... 🤯
@michaeljkane.bsky.social pop me over a message and will do what I can to help!
Second, and most importantly, the economics of it don't make sense, even at $0.05/response. For someone to scale this approach would require multiple user accounts (which we have robust guards against), meaning the break-even point for a bad actor is likely impossible to reach
First, the barrier to entry here is really high. Creating a bot like this is not trivial, and required an academic team to design and implement. Current 'naive' agents such as that offered by ChatGPT are simple to catch - track mouse moves, typing speed, or even simple reverse shibboleths
Lots of chatter about this paper currently. Its a stark warning, but at present I see this as a stark warning of what might come, not what is happening now. As a research community we need to see it as a call-to-arms to develop new strategies, NOT a call to abandon online sampling. Reasoning below
Agreed, its a stark warning, and should be a call-to-arms in terms of the community finding ways to detect such bots. Keen to work together with anyone who's interested to figure that out