Hello world 👋
My first paper at UT Austin!
We ask: what happens when medical “evidence” fed into an LLM is wrong? Should your AI stay faithful, or should it play it safe when the evidence is harmful?
We show that frontier LLMs accept counterfactual medical evidence at face value.🧵
Posts by Ramez Kouzy, MD
An overview of our AI-in-the-loop expert study pipeline: given a claim from a subreddit, we extract the PIO elements and retrieve the evidence automatically. The evidence, its context, and the evidence are then presented to a medical expert to provide a judgment and a rationale for the factuality of the claim.
Are we fact-checking medical claims the right way? 🩺🤔
Probably not. In our study, even experts struggled to verify Reddit health claims using end-to-end systems.
We show why—and argue fact-checking should be a dialogue, with patients in the loop
arxiv.org/abs/2506.20876
🧵1/
Methods and initial result
Results
More results and discussion, including comparison to proper LLM-assisted search.
Appendixes showing UI and some of the measures.
Do chatbots hinder #criticalThinking?
Contrary to fallacious causal conclusions drawn from correlational studies, this experiment found a scripted chatbot increased correct #factChecking solutions compared to unassisted students (N = 156).
doi.org/10.1016/j.ch...
#edu #tech
Excited to pilot our human performance augmentation approach with #AR/VR in #brachytherapy. Placing critical info in the physician's field of view eliminates workflow disruptions + is more ergonomic. Grateful for the opportunity to build this solution @mdanderson.bsky.social!
Basic backend workflow from data fetching with analysis using dual GPT 4o mini/4o combo using prompt engineering + JSON schema output. Thinking of new features soon that would augment trial understanding for busy clinicians.
Happy to share one of my side projects: TRAC - Trial Reasoning and Analysis Companion. A web app I developed for augmenting trial understanding and variable visualization with GPT-4o under the hood. #AITools
Great summary of our latest work realized by @hyesunyun.bsky.social👇🏼We find that LLMs are susceptible to spin in medical abstracts and can propagate into plain summaries. However prompting techniques such as CoT can help mitigate that. @jessyjli.bsky.social @byron.bsky.social
A new completely open reasoning model out of China, Deepseek-R1, is now available. The benchmarks show it at parity with the likes of o1 and Sonnet
In some informal tests on non-code problems, it is really good, not o1-pro level but surprisingly capable (and incredibly small & fast!). Big advance.
We suck at predicting the future of AI — and that's perfectly fine. Maybe the real question isn't 'What will happen?' but 'So what?
My new post on substack goes into this more deeply as I personally struggle with how to make sense of all of this.
greypascal.substack.com/p/beyond-pre...
Spot on! I would go and extend beyond just firms to broadly any task. I bet even in fields like healthcare people would be surprised by the error rate of humans. This post hits the nail on the head open.substack.com/pub/greypasc...
Ramez Kouzy, Roxanna Attar-Olyaee, Michael K. Rooney, Comron J. Hassanzadeh, Junyi Jessy Li, Osama Mohamad
QuaLLM-Health: An Adaptation of an LLM-Based Framework for Quantitative Data Extraction from Online Health Discussions
https://arxiv.org/abs/2411.17967
Happy Thanksgiving! 🦃 I wrote something that's been on my mind for a while about how we approach uncertainty in healthcare—and how AI might help bridge this gap. Check it out here: open.substack.com/pub/greypasc...
Cervical cancer mortality in US women younger than 25 years significantly declined between 2016 and 2021, likely due to the widespread adoption of HPV vaccination.
ja.ma/4i9ghPC
The Butterfly Nebula from Hubble
I just published my first Substack piece into the ether.
Why do we demand superhuman performance from AI while normalizing human imperfection?
greypascal.substack.com/p/the-perfec...
Love to be added! 🙏🏼
Starting a list of oncology related peopl. Please tell me more to add. Or any similar listd go.bsky.app/GKXp9Fy @n8pennell.bsky.social
🙋🏻♂️🙋🏻♂️ please