Hye Sun Yun (@hyesunyun) Bsky

This is a great follow-up to our recent preprint! This small-scale evaluation introduces a framing-resistant prompt and makes a step toward exploring the mitigation space for the framing sensitivity problem.

1 week ago 3 0 0 0

Thanks! I really enjoyed the write up of your evaluation work. I definitely agree that the evaluator model and even the evaluation
prompt matters a lot. The framing-resistant prompting was interesting and is a great start to finding mitigations for this issue!

1 week ago 1 0 0 0

Good point! We didn't evaluate on what we would say to be "easy" but tried to simulate a task that may be closer to real-world setting to empirically show how framing effect can impact patients directly

1 week ago 0 0 0 0

This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA Patients are increasingly turning to large language models (LLMs) with medical questions that are complex and difficult to articulate clearly. However, LLMs are sensitive to prompt phrasings and can b...

I would like to thank my amazing co-authors!
Geetika Kapoor, @mackert.bsky.social, @ramezkouzy.bsky.social, @cocoweixu.bsky.social, @jessyjli.bsky.social, and @byron.bsky.social. [6/6]

Please check out our full findings here: arxiv.org/abs/2604.05051

1 week ago 2 0 1 0

Our conclusion: LLM medical responses vary based on question phrasing alone, despite identical underlying evidence. For patients and consumers, how you ask may determine what you're told. [5/6]

1 week ago 6 1 1 0

We also compared using technical terms vs plain language terms in our questions. However, we didn’t find any meaningful differences in this language style. [4/6]

1 week ago 3 0 1 0

This framing effect is further amplified in multi-turn conversations, where sustained persuasion increases inconsistency. [3/6]

1 week ago 3 0 1 0

"Does this work?" vs "Does this not work?” Are conclusions different even though the LLM was given the same evidence documents?

Yes. Positive vs negative framing leads to more contradictory conclusions than responses from the positive question sampled twice. [2/6]

1 week ago 4 1 1 0

Patients ask LLMs medical questions — but how they phrase it matters more than it should.

Our new preprint explores how different phrasings of patient health questions can lead to inconsistent conclusions, even with the same evidence. [1/6]

Full Paper: arxiv.org/abs/2604.05051

1 week ago 25 6 2 4

As AI expands into medicine, Northeastern study finds AI models influenced by medical bias - Khoury College of Computer Sciences Humans can be easily influenced by language that is one-sided, especially in complex fields like medicine. But a new Khoury-led study shows that large language models, too, can be tricked […]

Thrilled to share our research showing how LLM models can be influenced by bias from "spun" medical literature is now featured in Northeastern's Khoury news! This shows critical insights as AI enters healthcare.
The full paper can be found at arxiv.org/abs/2502.07963

7 months ago 3 1 0 0

I am at CHIL this week to present my poster (Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?) on Thursday June 26.

Looking forward to connecting and sharing our work on spin with the CHIL community!

9 months ago 2 0 0 0

I am at CHI this week to present my poster (Framing Health Information: The Impact of Search Methods and Source Types on User Trust and Satisfaction in the Age of LLMs) on Wednesday April 30

CHI Program Link: programs.sigchi.org/chi/2025/pro...

Looking forward to connecting with you all!

11 months ago 1 1 0 0

Online Health Information–Seeking in the Era of Large Language Models: Cross-Sectional Web-Based Survey Study Background: As large language model (LLM)–based chatbots such as ChatGPT (OpenAI) grow in popularity, it is essential to understand their role in delivering online health information compared to other...

LLM-based chatbots are changing how people search for health information—but how do users perceive their quality and trustworthiness compared to other online sources? Our survey study explores these questions. Check it out! www.jmir.org/2025/1/e68560

1 year ago 2 0 0 0

Oxford Word of the Year 2024 - Oxford University Press The Oxford Word of the Year 2024 is 'brain rot'. Discover more about the winner, our shortlist, and 20 years of words that reflect the world.

I'm searching for some comp/ling experts to provide a precise definition of “slop” as it refers to text (see: corp.oup.com/word-of-the-...)

I put together a google form that should take no longer than 10 minutes to complete: forms.gle/oWxsCScW3dJU...
If you can help, I'd appreciate your input! 🙏

1 year ago 10 8 0 0

Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature? Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirica...

Huge thanks to my amazing co-authors!
Karen Y.C. Zhang, @ramezkouzy.bsky.social, @ijmarshall.bsky.social, @jessyjli.bsky.social, & @byron.bsky.social [7/7]

Check out our full findings here: arxiv.org/abs/2502.07963

1 year ago 1 0 0 0

Can we fix this? We tested zero-shot prompts to reduce LLMs' susceptibility to spin.
Good news: prompts that encouraged reasoning reduced their tendency to overstate trial results! 🛠️
Careful design is key to improving evidence synthesis for clinical decisions. [6/7]

1 year ago 3 0 1 0

When we asked LLMs to simplify abstracts into plain language, they often propagated spin into their summaries. This means LLMs could unintentionally mislead patients and non-experts about the effectiveness of treatments. 😱 [5/7]

1 year ago 2 0 1 0

We asked LLMs how favorably they perceived a treatment’s results (0-10 scale). Even though LLMs could detect spin, they were far more influenced by it than human experts.
Meaning: LLMs believed spun abstracts presented more favorable results! 😬 [4/7]

1 year ago 0 0 1 0

When we prompted 22 LLMs to identify spin in medical abstracts, we found that they were moderately to strongly capable of detecting spin.
However, things got interesting when we asked LLMs to interpret the results… [3/7]
🔽

1 year ago 1 0 1 0

So what is spin?
Spin refers to reporting strategies that make experimental treatments appear more beneficial than they actually are—often distracting from nonsignificant results.

Example:
❌ “The treatment shows a promising trend toward significance…”
✅ “No significant difference was found.”
[2/7]

1 year ago 1 0 1 0

🚨 Do LLMs fall for spin in medical literature? 🤔

In our new preprint, we find that LLMs are susceptible to biased reporting of clinical treatment benefits in abstracts—more so than human experts. 📄🔍 [1/7]

Full Paper: arxiv.org/abs/2502.07963

🧵👇

1 year ago 63 25 3 4

As someone interested in an academic position post-PhD, I found this post very helpful. Thank you for sharing your wisdom and advice.

1 year ago 1 0 0 0

Awesome! Thank you

1 year ago 1 0 0 0

Thank you!

1 year ago 1 0 1 0

The application form says it is no longer accepting responses. Is the application closed now?

1 year ago 1 0 1 0

Posts by Hye Sun Yun