Our PhD student Kennedy Orwa, who studies applications of AI to health care, was hastily deported today to Kenya along with his 13-year-old son without opportunity to speak to legal counsel.
King 5 reports that he held a valid visa that was rescinded without explanation.
Posts by Oleg Urminsky
This was a many-years undertaking, starting with a very Google-specific question years ago that we kept working on into the dawn of AI chat search, which added a whole new dimension to the paper.
Our results suggest that, at least in the context of search, confirmation bias in framing questions is the primary factor -- otherwise the two biases would cancel out and there would be no effect of broadening search.
An interesting theoretical aspect of this practical question: research in psychology has identified two potential dimensions of confirmation bias, bias in framing question and bias in which answers we pay attention to.
When platforms adjust algorithms to return broader search results, people update their beliefs more (under some conditions). This can be a good thing, to the degree that the incremental information provided by broadening is accurate.
The paper, led by Eugina Leung, is here:
www.pnas.org/doi/10.1073/...
We document that the human tendency for confirmation bias in question framing (the tendency of people to frame search in terms of their prior beliefs) and algorithms that optimize for relevance combine to impede belief updating.
Very excited that this @chicagoboothreview.bsky.social interview and our paper on the "Narrow Search Effect" are both out:
www.chicagobooth.edu/review/podca...
🧵
People’s spontaneous searches are narrow, intended to produce results in line with their prior beliefs—which tend to persist, but when presented with broader information, they tend to update their beliefs more, research by Leung & @olegurminsky.bsky.social suggests:
When you collect data online, are the results from humans or AI? In a project led by Booth PhD student Grace Zhang, we estimate the prevalence of AI agents on commonly used survey platforms:
osf.io/preprints/ps...
🧵
Yes, off the shelf agents fail these and we do see failures on the platforms. There are also respondents who fail other checks who pass those. One limitation in our study, however, is that we bundled it with a typing check so we don't have as clear a read on specific checks.
Worth a look if you're running online studies. Happy to be a part of this, great work led by Grace. Check out Oleg's thread; feedback welcome!
Agreed, we did not test proctoring.
Nope, not dead at all, it's actually the primary method for collecting custom survey data in social science. My personal opinion is that online data collection is an incredibly valuable resource that we need to invest in saving from obliteration by AI, your view may differ.
Ooops. Meant to say "AI checks can also mistake non-compliant human respondents for AI."
Feedback, questions are welcome!
Some caveats. Off-the-shelf AI agents can pass survey checks with human assistance and AI agents can be purpose-built to pass checks without human assistance. Detecting AI agents is a moving target: ongoing independent testing of survey platforms is needed.
Using AI checks to screen out respondents is bad, helping AI learn to evade checks. Better to just collect data and pre-register exclusions (and move to a better platform when the fail rate is too high).
espondents for AI. Survey participants ignoring instructions and copy-pasting text into open-ended responses has been a problem for a long time:
marginallysignificant.com/2019/03/18/u...
Some thoughts. Relying on just one AI check is not a good idea – AI agents differ in their capabilities. Use multiple tests, varied regularly. Not checking against a human baseline may lead to over-estimating AI agents: AI and humans are bad at some of the same things.
This can matter for survey results. Platforms with more AI failures estimated less disapproval of using AI to complete surveys. Excluding potential AI agents reduced those differences (vs. humans at Mindworks @CDR_Booth).
The results differ substantially across platforms. @joinprolific.bsky.social and @cloudresearch.bsky.social ’s Connect panel have relatively low failure rates, while Mturk (even via @cloudresearch.bsky.social) has a high failure rate.
We use five AI checks, validating that common AI agents fail our checks but in-person human respondents do not. We then collect data on 7 online platforms.
Recent work by @seanjwestwood.bsky.social in PNAS has raised a red flag about AI agents being able to complete online surveys.
www.pnas.org/doi/10.1073/...
www.pnas.org/doi/10.1073/...
When you collect data online, are the results from humans or AI? In a project led by Booth PhD student Grace Zhang, we estimate the prevalence of AI agents on commonly used survey platforms:
osf.io/preprints/ps...
🧵
“I've said to a couple of my colleagues, like, man, what would it have looked like in 2024 when we're in the thick of the campaigns, if we stopped being so defensive on immigration and we went on the offensive; had we all collectively rang the alarm when we saw it start.”
From this follow simple recommendations: as a default, meta-scientific studies of published research artefacts need to include 1) a full, identifiable list of included studies, 2) the full coding instrument and decision rules, and 3) the individual ratings together with a codebook.
Our instinct to seek confirmation leads to ‘narrow’ internet search behaviour.
Chatbots, trained to be helpful, tend to go along with this, but could be trained to help us update our beliefs.
@olegurminsky.bsky.social researched this and explains what he found:
buff.ly/0eafZ78
"It's a general issue in lots of technology that the technology is designed to try to be helpful to us and around our needs, but often it takes a simple view of what those needs are and leaves out some of the needs." - @olegurminsky.bsky.social
www.chicagobooth.edu/review/podca...
This is one of the most beautiful things I have witnessed, the craft here is impeccable.