phelimb (@phe-lim) Bsky

Paper: coevolution.fas.harvard.edu/publications...

It's what our sciences team at @Prolific also built HUMAINE to address - LLM evaluation using demographically stratified participants. We're presenting it at #ICLR2026 soon!

huggingface.co/spaces/Proli...

6 hours ago 1 0 0 0

Researchers from @harvard.edu find that LLMs claiming "human-like" performance actually reflect a very specific subset of humanity.

They cluster closest to WEIRD populations (Western, Educated, Industrialized, Rich, Democratic), diverging as psychological distance increases (r ≈ -0.70) 👇🏻

6 hours ago 1 0 1 0

OSF

AI pollution in human data samples is a hot topic.

Some great work from @andrewgordon.bsky.social et al. showing that concerns here are (generally) overblown, with the majority of platforms empirically showing low levels of AI pollution.

osf.io/preprints/ps...

2 weeks ago 3 1 0 0

OSF

New preprint out today (osf.io/preprints/ps...). We tested whether AI agents are actually infiltrating online surveys.

Spoiler alert: they aren't

Thread 🧵

[1/9]

2 weeks ago 134 63 2 10

As of today, if an AI agent is detected in your Prolific study, you'll get twice the cost of that participant back. We’re calling this our 100% Human Guarantee.

Years of investing into @joinprolific.bsky.social's system has made us confident in data integrity.

www.prolific.com/100-human-gu...

3 weeks ago 3 2 0 0

New working paper on online research data quality, led by @univie.ac.at, reveals that pass rates on quality checks vary wildly by source. Pretty interesting.

Prolific: 90% | Lab: 80% | Bilendi: 73% | Moblab: 55% | MTurk: 9% | AI agents: 0%

github.com/survey-data-... CC @jyusof.bsky.social

3 weeks ago 2 1 1 0

I don't disagree that the rules of the game likely need to change/have changed. Assumptions about what's required in order to guarantee different levels of assurance will need to change.

1 month ago 1 0 0 0

I’m confident that this is addressable with sufficient innovation/investment personally. May require some significant changes in how we run projects though. E.g. controlled envs, multimodal tooling beyond text, high confidence auditing back to identified humans, etc

1 month ago 1 0 1 0

+1.

Eerke, I can only speak for Prolific, but the team here is full of smart, motivated people working extremely hard to maintain and improve the integrity and quality of our platform for running online research. The misuse of AI tools is a threat, but one that can be protected against.

1 month ago 1 0 1 0

Lots of hard work from the Prolific team to achieve the lowest rate of AI misuse detected in this study. More to do to get this to 0, though!

1 month ago 2 0 0 0

The sky is not falling; high-quality platforms (Prolific, Verasight, CR Connect) have low rates of apparent bots. osf.io/preprints/ps... But also not zero; vigilance is very much needed!

1 month ago 106 49 1 2

Authenticity checks detect AI agents best | Prolific How we tested the most accurate method for identifying agentic AI

We ran a controlled study of 125 verified humans vs 5 AI agents. Can agents reliably be detected?
Here's what we found:
www.prolific.com/resources/au...

1 month ago 1 1 0 0

Beta is currently available in Qualtrics. We’re actively scoping integrations with additional platforms and the tech is generalisable. If helpful, I’d be happy to connect you with someone from Prolific to learn more about your feedback?

2 months ago 0 0 1 0

Frontiers episode 1:

Jerome Wynne from @Prolific in conversation with Crystal Qian, from Google DeepMind, talking about Deliberate Lab: a platform for running online research experiments on human + LLM group dynamics.
www.youtube.com/watch?v=5vyi...

2 months ago 0 0 0 0

AI agents are becoming a serious threat to research data quality.

Today we’re rolling out Bot authenticity checks on @joinprolific.bsky.social, detecting agentic AI with 100% accuracy in testing.

Comes with a native Qualtrics integration! More info:

www.prolific.com/resources/in...

2 months ago 13 7 2 1

Fresh HUMAINE results are here.

Gemini 3 is still first, but Mistral Large 3 and Deepseek v3.2 are making things interesting.

Opus 4.5 didn't dominate, but Antropic is likely prioritizing complex reasoning/coding over the conversational fluency that this benchmark favors.

prolific.com/humaine

4 months ago 1 0 0 0

Lots of chatter about this paper currently. Its a stark warning, but at present I see this as a stark warning of what might come, not what is happening now. As a research community we need to see it as a call-to-arms to develop new strategies, NOT a call to abandon online sampling. Reasoning below

5 months ago 5 3 1 0

All fair. Expected to see more diversity in modalities also. Qual studies (which can now be done at scale) are likely to be more robust than survey only.

5 months ago 2 0 0 0

Studies aren't distributed on a first-come first-served basis, but it's a useful theory. I will share with the team.

5 months ago 0 0 0 0

Right. It is generalisable (JS plugin), though we don't have a native integration with otree yet.

5 months ago 0 0 1 0

Will reach out to the authors to see if we can understand more details & see if we can add Authenticity Check as a mitigation option.

5 months ago 0 0 0 0

45% of participants copying OR pasting ~= 45% LLM use.

Only single-digit responses seem to fail their honeypot and other mitigations, which is closer to our internal prevalence measures.

5 months ago 0 0 1 0

There are many reasons to copy/paste while still being a conscientious human.

"Even to an untrained eye,some of these responses were obviously generated by LLMs", but the % doesn't seem reported?

5 months ago 0 0 1 0

If I'm reading the paper correctly, their detection of prevalence was "we only tracked copying and pasting on a page containing an openended question" - this is fairly crude measure of llm detection and is upper bound rather than an accurate prevalence measure

5 months ago 0 0 1 0

LLM use by real humans is a slightly different threat to the scaled agent threat discussed in the paper though, and I think requires a bit more nuance in its response.

5 months ago 0 0 1 0

How to add authenticity checks to your Qualtrics study | Prolific Research

I hadn't, thanks for sharing. Agree with many of the mitigation strategies, though given data was collected on Prolific we would have reccommeded our built in tool.

researcher-help.prolific.com/en/articles/...

5 months ago 2 0 2 0

Prolific sets standards for authentic human data collection | Prolific Discover how Prolific's data quality system, Protocol, sets industry standards for authentic human data collection

prolific.com/resources/pr...

If you want to work on these problems, or collaborate on research in this area, get in touch. Much more to come in this space!

5 months ago 5 0 0 0

Without minimising the seriousness of the threat raised in this paper, I'm more optimistic. This is just the latest challenge in online integrity of online research.

We've been proactively adding to our suite of authenticity tools - more every week - including many of Sean's recommendations:

5 months ago 11 2 1 2

Why have I been asked to recheck my identity? | Prolific Participants

We also do spot checks to protect against account reselling: participant-help.prolific.com/en/articles/...

5 months ago 0 0 0 0

Not sure I agree – these are tractable challenges and we are working on them.

bsky.app/profile/phe-...

5 months ago 0 0 1 0

Posts by phelimb