Nick Tiller, Ph.D. (@nbtiller) Bsky

AI Chatbot Told Users That Herbal Remedies Can Treat Cancer Nearly half of chatbot answers to medical questions deemed 'problematic' in study

www.medpagetoday.com/practicemana... #chatbots #health

13 hours ago 6 1 0 0

ChatGPT’s latest stylistic quirk is sinister, infuriating – and absolutely everywhere | Stuart Heritage Once you start noticing “it’s not X, it’s Y” as you scroll online, you can’t fail to register it. I’ve become so hypervigilant that it has seeped into my subconscious thoughts

It's one of ChatGPT's most insidious tells: "It's not X; it's Y" has become shorthand for lazy AI slop.

www.theguardian.com/commentisfre...

#AIWriting #ChatGPT

5 days ago 5 1 0 0

Tiller NB, Marcon AR, Zenone M, et al. Generative AI-driven chatbots and medical misinformation: an accuracy, referencing and readability audit. BMJ Open 2026;16:e112695. doi:10.1136/ bmjopen-2025-112695

TAKE HOME MESSAGE:

-Chatbots perform poorly in misinformation-prone health and medical fields.
-Continued deployment without education and oversight risks amplifying misinformation. 7/7 END

6 days ago 3 0 0 0

Flesch Reading Ease scores. All scores 30-50: “Difficult” readability, equating to college sophomore to senior. Each data point is an individual response (25/chatbot), with mean ± STDEV. aSignificantly different from Gemini, bSignificantly different from Deep Seek, cSignificantly different from Meta AI, dSignificantly different from ChatGPT, eSignificantly different from Grok.

6/7 Results: READABILITY

-Readability was graded as “Difficult” (Flesch Scores = 30–50).

-Equivalent to a college sophomore–senior level.

-Gemini 'slightly' better than the others.

6 days ago 1 0 1 0

Reference Completeness. Dark blue = % of references complete and correct; light blue = % of references incomplete and/or incorrect. aSignificantly different from Gemini, bSignificantly different from Deep Seek, cSignificantly different from Meta AI, d=Significantly different from ChatGPT, eSignificantly different from Grok.

5/7 Results: Reference ACCURACY

-Completeness score averaged 40% (authors, dates, DOIs often wrong).

-Frequent hallucinations and fabricated citations.

-No chatbot produced a fully accurate reference list.

6 days ago 1 0 1 0

Response quality. Blue = Non-problematic; Yellow = Somewhat problematic; Orange = Highly problematic. *Significantly more than expected at p < 0.05.

4/7 Results: Response QUALITY:

50% were “Problematic.”
-30% “Somewhat.”
-20% “Highly.”

Grok produced more “highly problematic” responses than expected (p = .038).

Only two refusals to answer from 250 questions (0.8%).

6 days ago 1 0 1 0

3/7 Assessments:

(i) Response Quality
Coded as “non-problematic,” “somewhat problematic,” or “highly problematic.”

(ii) Reference Accuracy
References returned, completeness, accuracy score

(iii) Readability
Flesch Ease of Reading Score

6 days ago 1 0 1 0

We asked five chatbots 250 questions across cancer, vaccines, stem cells, nutrition, and performance. 2/7

6 days ago 1 0 1 0

https://bmjopen.bmj.com/content/16/4/e112695

Our NEW study in @BMJ_Open is an audit of 𝐡𝐞𝐚𝐥𝐭𝐡 𝐦𝐢𝐬𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 spread by popular AI chatbots.
Learn more🧵1/7

Thx to amazing team: @CaulfieldTim @srmarcon at
@UAlbertaLaw & @Jeukendrup @marco_zenone

🔗https://bmjopen.bmj.com/content/16/4/e112695
#health #misinformation #AI

6 days ago 8 3 1 0

https://skepticalinquirer.org/exclusive/is-it-time-we-stop-publishing-acupuncture-research-from-china/

99% of #acupuncture studies from China report benefits of the therapy. "This isn’t routine bias; it’s the systematic sterilization of negative outcomes."

Is it time we stop publishing acupuncture studies from China? New column in @SkeptInquirer.

2 weeks ago 4 0 0 0

https://www.independent.co.uk/travel/news-and-advice/everest-climbers-sherpas-fake-rescue-scam-poisoning-b2950597.html

"After trekkers reported nausea, dizziness or body aches, they were advised to descend and agree to costly emergency helicopter evacuations. Authorities said operators then used forged medical and flight documents to claim costs from international travel insurers."

2 weeks ago 2 0 0 0

A "good" VO2max for your age, like 60th-80th percentile, gives you all the "longevity" benefits the metric will provide. There is no lifespan advantage to being Olympic-level fit.

1 month ago 6 0 0 0

https://subscriber.ultrarunning.com/archive/issue/feb-mar-2026

New feature in the Feb/Mar issue of @UltraRunningMag 🏃‍♂️🏃‍♀️🏔️

2 months ago 5 0 0 0

The daily use of AI, in everything from smartphones to stoplights, can be traced back to a simple game of checkers—played by Aurhur Samuel in 1959.

➡️gwern.net/doc/reinforcement-learni...

2 months ago 1 0 0 0

https://skepticalinquirer.org/exclusive/extraordinary-claims-the-homeopathy-paper-that-duped-a-mainstream-journal/

Bad science, lies, and possible fraud: The homeopathy paper that duped a mainstream journal. New column today in @SkeptInquirer #health #pseudoscience

Read it ⬇️⬇️: h/t @theliverdoc

2 months ago 2 0 1 0

FIVE PERCENT of Americans regularly consult psychic services, with ONE-THIRD (30%) using them occasionally. Though high, it's actually lower than many other high-income developed countries.

2 months ago 1 0 0 0

#quotes via @nbtiller.bsky.social

2 months ago 5 2 0 0

🧵7/7 And yet, the authors insist they have "no conflicts of interest to declare."

The lesson: Extraordinary claims require extraordinary evidence, not bad science and major undisclosed conflicts of interest.

END.

2 months ago 5 0 0 0

Red Flag: She also co-founded Avrox—the company that funded the study. 🧵6/7

2 months ago 3 0 1 0

Red Flag: I also found a 2016 patent application for a “nanoencapsulated oxygen” beverage (Publication number 20180193260), in which Professor Eleanor Stride, one of the study authors, is listed as a co-inventor. 🧵5/7

2 months ago 2 0 1 0

Avrox features the article on its website, alongside pull quotes from the study authors. 🧵4/7

2 months ago 2 0 1 0

Red Flag: The study was funded by Avrox Technologies, a prominent vendor of oxygenated beverages. 🧵3/7

2 months ago 2 0 1 0

First, @Jeukendrup and I showed that the amount of extra O2 supplied by the beverage was negligible (around 15 mL) compared to the volume inhaled by the respiratory system (around 150,000 mL): translating to an extra 0.09 watts of power. 🧵2/7

2 months ago 2 0 1 0

A quick lesson in research conflicts of interest:
This 2024 paper in the Journal of Dietary Supplements showed that an oxygen "nanobubble beverage" improved power output in a 16-km cycling TT by ~4%, and in repeated Wingates by ~7%. 🧵1/7

2 months ago 8 3 1 0

https://ourworldindata.org/grapher/political-polarization-score

Political polarization. Negative numbers (blue) reflect less polarization and more friendly political interactions. Positive numbers (red) reflect more division and hostile interactions.

Myanmar=3.72, US=1.79, Britain=-0.24, Norway=-2.1.

2 months ago 4 0 0 0

Don't criticize Attia to promote your own brand. That's fake and disingenuous: criticize him for being a dirtbag. And remember, he's far from the only wellness influencer contributing to the industry's ethical rot.

2 months ago 5 2 0 0

More acupuncture nonsense.
L.I.4 (Hegu, the joining valley) is said to "treat" an astonishing array of conditions, including mumps and "pain in the arm." This is medically impossible—out of step with everything we've learned about anatomy and physiology since Hippocrates.

2 months ago 3 0 0 0

People guess which group they're in, and the index quantifies how often they guess correctly: ranges from −1 to +1, and zero is "perfect blinding."

0.6 in the acupuncture group means a strong tendency toward correct identification (among those who ventured a guess).

2 months ago 1 0 0 0

Takeaway: Both groups improved, but acupuncture outperformed “sham” by ~1 fewer migraine days/month. Subjective outcomes, likely unblinding, and unadjusted statistics indicate that the findings are unlikely to be clinically meaningful. 🧵6/6

2 months ago 3 0 0 0

3. No correction for multiple testing (they used uncorrected t-tests), thereby increasing the risk of false positive findings. 🧵5/6

2 months ago 2 0 1 0

Posts by Nick Tiller, Ph.D.