Posts by Melanie Brucks
As videoconferencing becomes standard in work, health, education, and the legal system, people without reliable high-speed internet may face systematic disadvantages. The internet is often called an equalizer, but if access to internet is unequal, it may instead exacerbate disadvantage. (6/6)
And not all glitches are equally harmful— the more uncanny a glitch feels, the more it undermines evaluations of the person on screen. (5/6)
We found that glitches only undermined interpersonal judgments in video calls that simulate face-to-face interaction (therefore producing uncanniness), showing that the negative effect produced by glitches goes beyond mere disruptiveness, comprehension difficulties and negative attributions. (4/6)
Why does this happen? Glitches disrupt the illusion of real face-to-face interaction. Distorted faces, choppy motion, and audio hiccups create a strange, creepy, or eerie feeling termed “uncanniness”— and that feeling undermines interpersonal judgments. (3/6)
For example, glitches during job interviews decrease hiring likelihood, and in analyzing actual parole hearings, the presence of glitches was associated with a 12-percentage-point difference in whether someone was granted parole (48% vs. 60%) (2/6)
Loved seeing GLITCHES in yesterday’s NYT crossword-- perfect timing for our new paper w/ Jacqueline Rifkin (co-first author) and Jeff Johnson, which examines how minor video-call glitches (even when no information is lost) meaningfully impact important decisions. (1/6)
7/ Read the full paper → journals.plos.org/plosone/arti...
Happy to discuss or answer questions!
6/ Why it matters
Prompt architecture can influence outputs in high-stakes domains:
• Hiring decisions
• Medical triage
• Policy or scientific research summaries
• Your research papers!
In each case, prompt architecture could silently skew results—unless we actively correct for it.
5/ Mitigation strategy
Instead of searching for a “perfect” prompt, we propose Prompt Aggregation: By asking the same question multiple ways and combining answers, we can cancel out these biases.
In our “honey vs maple” example, aggregation favors honey in 5 of 8 prompts. Try it out yourself!
4/ Implication: There is no neutral prompt
You can't write your way around prompt architecture effects because any prompt must have some order, some framing, some structure.
GPT-3, GPT-4, and Llama 3.1 all exhibited different prompt architecture biases.
3/ Core insight
We found LLMs are systematically biased by seemingly trivial prompt architecture:
• Option order (e.g., "honey or maple" vs "maple or honey")
• Option labels (e.g., A/B vs B/A)
• Question framing (e.g., "closer" vs "further")
• Asking for justification
Seems straightforward... until you flip the order. Same question, different answers. But why?
user: Is A honey or B maple syrup closer to sugar? chatGPT: the answer is B. maple syrup-- maple syrup is closer to pure sugar than honey
1/ Setup
Imagine you ask a simple question to ChatGPT:
New paper alert with Olivier Toubia! We show how prompt architecture introduces systematic error in LLM responses.
🧵Key findings from our study on prompt structure (and how to mitigate silent bias in your research):