Posts by Nick Tiller, Ph.D.
It's one of ChatGPT's most insidious tells: "It's not X; it's Y" has become shorthand for lazy AI slop.
www.theguardian.com/commentisfre...
#AIWriting #ChatGPT
Tiller NB, Marcon AR, Zenone M, et al. Generative AI-driven chatbots and medical misinformation: an accuracy, referencing and readability audit. BMJ Open 2026;16:e112695. doi:10.1136/ bmjopen-2025-112695
TAKE HOME MESSAGE:
-Chatbots perform poorly in misinformation-prone health and medical fields.
-Continued deployment without education and oversight risks amplifying misinformation. 7/7 END
Flesch Reading Ease scores. All scores 30-50: “Difficult” readability, equating to college sophomore to senior. Each data point is an individual response (25/chatbot), with mean ± STDEV. aSignificantly different from Gemini, bSignificantly different from Deep Seek, cSignificantly different from Meta AI, dSignificantly different from ChatGPT, eSignificantly different from Grok.
6/7 Results: READABILITY
-Readability was graded as “Difficult” (Flesch Scores = 30–50).
-Equivalent to a college sophomore–senior level.
-Gemini 'slightly' better than the others.
Reference Completeness. Dark blue = % of references complete and correct; light blue = % of references incomplete and/or incorrect. aSignificantly different from Gemini, bSignificantly different from Deep Seek, cSignificantly different from Meta AI, d=Significantly different from ChatGPT, eSignificantly different from Grok.
5/7 Results: Reference ACCURACY
-Completeness score averaged 40% (authors, dates, DOIs often wrong).
-Frequent hallucinations and fabricated citations.
-No chatbot produced a fully accurate reference list.
Response quality. Blue = Non-problematic; Yellow = Somewhat problematic; Orange = Highly problematic. *Significantly more than expected at p < 0.05.
4/7 Results: Response QUALITY:
50% were “Problematic.”
-30% “Somewhat.”
-20% “Highly.”
Grok produced more “highly problematic” responses than expected (p = .038).
Only two refusals to answer from 250 questions (0.8%).
3/7 Assessments:
(i) Response Quality
Coded as “non-problematic,” “somewhat problematic,” or “highly problematic.”
(ii) Reference Accuracy
References returned, completeness, accuracy score
(iii) Readability
Flesch Ease of Reading Score
We asked five chatbots 250 questions across cancer, vaccines, stem cells, nutrition, and performance. 2/7
https://bmjopen.bmj.com/content/16/4/e112695
Our NEW study in @BMJ_Open is an audit of 𝐡𝐞𝐚𝐥𝐭𝐡 𝐦𝐢𝐬𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 spread by popular AI chatbots.
Learn more🧵1/7
Thx to amazing team: @CaulfieldTim @srmarcon at
@UAlbertaLaw & @Jeukendrup @marco_zenone
🔗https://bmjopen.bmj.com/content/16/4/e112695
#health #misinformation #AI
https://skepticalinquirer.org/exclusive/is-it-time-we-stop-publishing-acupuncture-research-from-china/
99% of #acupuncture studies from China report benefits of the therapy. "This isn’t routine bias; it’s the systematic sterilization of negative outcomes."
Is it time we stop publishing acupuncture studies from China? New column in @SkeptInquirer.
https://www.independent.co.uk/travel/news-and-advice/everest-climbers-sherpas-fake-rescue-scam-poisoning-b2950597.html
"After trekkers reported nausea, dizziness or body aches, they were advised to descend and agree to costly emergency helicopter evacuations. Authorities said operators then used forged medical and flight documents to claim costs from international travel insurers."
A "good" VO2max for your age, like 60th-80th percentile, gives you all the "longevity" benefits the metric will provide. There is no lifespan advantage to being Olympic-level fit.
https://subscriber.ultrarunning.com/archive/issue/feb-mar-2026
New feature in the Feb/Mar issue of @UltraRunningMag 🏃♂️🏃♀️🏔️
The daily use of AI, in everything from smartphones to stoplights, can be traced back to a simple game of checkers—played by Aurhur Samuel in 1959.
➡️gwern.net/doc/reinforcement-learni...
https://skepticalinquirer.org/exclusive/extraordinary-claims-the-homeopathy-paper-that-duped-a-mainstream-journal/
Bad science, lies, and possible fraud: The homeopathy paper that duped a mainstream journal. New column today in @SkeptInquirer #health #pseudoscience
Read it ⬇️⬇️: h/t @theliverdoc
FIVE PERCENT of Americans regularly consult psychic services, with ONE-THIRD (30%) using them occasionally. Though high, it's actually lower than many other high-income developed countries.
#quotes via @nbtiller.bsky.social
🧵7/7 And yet, the authors insist they have "no conflicts of interest to declare."
The lesson: Extraordinary claims require extraordinary evidence, not bad science and major undisclosed conflicts of interest.
END.
Red Flag: She also co-founded Avrox—the company that funded the study. 🧵6/7
Red Flag: I also found a 2016 patent application for a “nanoencapsulated oxygen” beverage (Publication number 20180193260), in which Professor Eleanor Stride, one of the study authors, is listed as a co-inventor. 🧵5/7
Avrox features the article on its website, alongside pull quotes from the study authors. 🧵4/7
Red Flag: The study was funded by Avrox Technologies, a prominent vendor of oxygenated beverages. 🧵3/7
First, @Jeukendrup and I showed that the amount of extra O2 supplied by the beverage was negligible (around 15 mL) compared to the volume inhaled by the respiratory system (around 150,000 mL): translating to an extra 0.09 watts of power. 🧵2/7
A quick lesson in research conflicts of interest:
This 2024 paper in the Journal of Dietary Supplements showed that an oxygen "nanobubble beverage" improved power output in a 16-km cycling TT by ~4%, and in repeated Wingates by ~7%. 🧵1/7
https://ourworldindata.org/grapher/political-polarization-score
Political polarization. Negative numbers (blue) reflect less polarization and more friendly political interactions. Positive numbers (red) reflect more division and hostile interactions.
Myanmar=3.72, US=1.79, Britain=-0.24, Norway=-2.1.
Don't criticize Attia to promote your own brand. That's fake and disingenuous: criticize him for being a dirtbag. And remember, he's far from the only wellness influencer contributing to the industry's ethical rot.
More acupuncture nonsense.
L.I.4 (Hegu, the joining valley) is said to "treat" an astonishing array of conditions, including mumps and "pain in the arm." This is medically impossible—out of step with everything we've learned about anatomy and physiology since Hippocrates.
People guess which group they're in, and the index quantifies how often they guess correctly: ranges from −1 to +1, and zero is "perfect blinding."
0.6 in the acupuncture group means a strong tendency toward correct identification (among those who ventured a guess).
Takeaway: Both groups improved, but acupuncture outperformed “sham” by ~1 fewer migraine days/month. Subjective outcomes, likely unblinding, and unadjusted statistics indicate that the findings are unlikely to be clinically meaningful. 🧵6/6
3. No correction for multiple testing (they used uncorrected t-tests), thereby increasing the risk of false positive findings. 🧵5/6