🔥 So much in this paper: CT RATE (one of the largest public chest CT/text report datasets), CT-CLIP (a 3D chest CT foundation model) and CT-CHAT, a conversational model building on CT-CLIP @ibrahimethem.bsky.social
Posts by Christian Bluethgen
🚨 Agentic Systems in Radiology
Everyone’s talking about agents🕵️♀️🕵️♂️ — but what happens when you bring them into real clinical workflows?
Our new preprint explores the design, applications, challenges, and evaluation of LLM-based agents and agentic workflows in radiology 👇
arxiv.org/abs/2510.09404
With amazing colleagues: @stanfordmedicine.bsky.social @stanfordaimi.bsky.social @curtlanglotz.bsky.social @akshay-chaudhari.bsky.social; @krauthammerlab.bsky.social; @ethz.ch @michaelmoor.bsky.social; @rwth.bsky.social @danieltruhn.bsky.social; @tudresden.bsky.social @jnkt.bsky.social & HOPPR
🚨 Agentic Systems in Radiology
Everyone’s talking about agents🕵️♀️🕵️♂️ — but what happens when you bring them into real clinical workflows?
Our new preprint explores the design, applications, challenges, and evaluation of LLM-based agents and agentic workflows in radiology 👇
arxiv.org/abs/2510.09404
💥 We unveil our paper accepted at the #ACL2025 Main Conference:
Automated Structured Report Generation
Let's revisit automated radiology report generation for CXR.
Free-form reports make it hard for AI systems to learn accurate generation, and even harder to evaluate. 🧵👇
@StanfordAIMI @hopprai
👍 If you're interested in #LLMs in #radiology, this is a recommended read!
💯 While the article focuses primarily on LLMs, as the authors recommended, "Keep an eye on [large multimodal models]".
👉 pubs.rsna.org/doi/10.1148/...
#RadiologyAI #AIStrategy #LMMs
When reading AI benchmarks, aside from the fact that many of the AIs are (accidentally or on purpose) trained on the test set, many tests are just bad. MMLU likely maxes out at 90% or so because so many of the questions in it are just wrong. It is also uncalibrated in difficulty across questions.
lot of folks talking about MCP and developers rushing to support it - but it does seem a bit like hype-y.
interesting perspective: hardcoresoftware.learningbyshipping.com/p/230-mcp-it...
that's a first :O
Chaired two insightful sessions at #ECR2025 today!
"Standardization and Reporting in AI Research" with experts Hans Reitsma Mike Klontzas Annika Reinke
"How to Use ChatGPT for Academic and Administrative Tasks" Andreas S. Brendlin @cxbln.bsky.social and Ghizlane Lembarki
@myesr.bsky.social
still thinking about this interaction with Claude (Oct '24) from time to time
r1ing away on an ordinary machine
.. straight out of Kahnemans lesser known book "Thinking .. mostly slow"
For Germans unable to vote locally, it is easy to partake, here's how: www.bundesregierung.de/breg-de/schw...
The Illustrated DeepSeek-R1
Spent the weekend reading the paper and sorting through the intuitions. Here's a visual guide and the main intuitions to understand the model and the process that created it.
newsletter.languagemodels.co/p/the-illust...
it's a journey
how’d you guess?
another excellent blog post by @chiphuyen.bsky.social, this time on #agents
huyenchip.com/2025/01/07/a...
People: please don't ML for the sake of ML.
I keep seeing manuscripts using fancy machine learning on brain-imaging data, where, in my opinion (having processed a lot of brain-imaging data), the method is way too complex for the richness of the data.
Fancier is not better per se
my wl:
- fewer low-hanging proof-of-concept studies (yes, it works for your specialty/organ/classification, too)
- fewer head-to-head model comparisons (yes, llama 3.1 was better than llama 2 for your case, but peer review took 8 months, now there’s llama 4)
- instead: RCTs, relevant outcomes
thanks for sharing
"productization requires [standardization in development and deployment], which is antithetical to research. [...] PhDs are supposed to come up with innovative ideas, validate these ideas, report the findings to the community by writing papers and then move on."
[slightly edited to fit post limits]
A market research team's dream: gathering real-world use cases and improvement opportunities directly from natural product usage
from: www.anthropic.com/research/clio
NEW PREPRINT
A detailed overview of 32 popular predictive performance metrics for prediction models
arxiv.org/abs/2412.10288
this method doesn’t rely on the pin 📌 emoji so it’s more private. instructions: bsky.app/profile/luci...