Advertisement · 728 × 90

Posts by Christian Bluethgen

🔥 So much in this paper: CT RATE (one of the largest public chest CT/text report datasets), CT-CLIP (a 3D chest CT foundation model) and CT-CHAT, a conversational model building on CT-CLIP @ibrahimethem.bsky.social

2 months ago 0 0 0 0
Post image

🚨 Agentic Systems in Radiology

Everyone’s talking about agents🕵️‍♀️🕵️‍♂️ — but what happens when you bring them into real clinical workflows?

Our new preprint explores the design, applications, challenges, and evaluation of LLM-based agents and agentic workflows in radiology 👇

arxiv.org/abs/2510.09404

5 months ago 8 2 1 0

With amazing colleagues: @stanfordmedicine.bsky.social @stanfordaimi.bsky.social @curtlanglotz.bsky.social @akshay-chaudhari.bsky.social; @krauthammerlab.bsky.social; @ethz.ch @michaelmoor.bsky.social; @rwth.bsky.social @danieltruhn.bsky.social; @tudresden.bsky.social @jnkt.bsky.social & HOPPR

5 months ago 1 0 0 0
Post image

🚨 Agentic Systems in Radiology

Everyone’s talking about agents🕵️‍♀️🕵️‍♂️ — but what happens when you bring them into real clinical workflows?

Our new preprint explores the design, applications, challenges, and evaluation of LLM-based agents and agentic workflows in radiology 👇

arxiv.org/abs/2510.09404

5 months ago 8 2 1 0

💥 We unveil our paper accepted at the #ACL2025 Main Conference:
Automated Structured Report Generation

Let's revisit automated radiology report generation for CXR.
Free-form reports make it hard for AI systems to learn accurate generation, and even harder to evaluate. 🧵👇
@StanfordAIMI @hopprai

10 months ago 7 3 1 0
Post image

👍 If you're interested in #LLMs in #radiology, this is a recommended read!

💯 While the article focuses primarily on LLMs, as the authors recommended, "Keep an eye on [large multimodal models]".

👉 pubs.rsna.org/doi/10.1148/...
#RadiologyAI #AIStrategy #LMMs

11 months ago 4 1 0 0

When reading AI benchmarks, aside from the fact that many of the AIs are (accidentally or on purpose) trained on the test set, many tests are just bad. MMLU likely maxes out at 90% or so because so many of the questions in it are just wrong. It is also uncalibrated in difficulty across questions.

11 months ago 27 3 0 1
Preview
Large Language Models Pass the Turing Test We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations s...

LLMs formally pass the Turing test

arxiv.org/abs/2503.23674

1 year ago 4 0 0 0
Preview
230. MCP - It's Hot, But Will It Win? There's a long history of "middleware" in our industry. Everyone wants it. There's always a hot one, but it rarely makes to the finish line and often disappoints.

lot of folks talking about MCP and developers rushing to support it - but it does seem a bit like hype-y.

interesting perspective: hardcoresoftware.learningbyshipping.com/p/230-mcp-it...

1 year ago 1 0 0 0
Advertisement
Post image

that's a first :O

1 year ago 1 0 1 0
Preview
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Reasoning is a critical frontier for advancing medical image analysis, where transparency and trustworthiness play a central role in both clinician trust and regulatory approval. Although Medical Visu...

arxiv.org/abs/2502.19634

1 year ago 2 0 1 0
Preview
A quarter of startups in YC’s current cohort have codebases that are almost entirely AI-generated With the release of new AI models that are better at coding, developers are increasingly using AI to generate code. One of the newest examples is the current batch of Y Combinator, the storied Silicon Valley startup accelerator. A quarter of the W25…

A quarter of startups in YC’s current cohort have codebases that are almost entirely AI-generated

1 year ago 16 6 3 3
Post image Post image Post image Post image

Chaired two insightful sessions at #ECR2025 today!

"Standardization and Reporting in AI Research" with experts Hans Reitsma Mike Klontzas Annika Reinke

"How to Use ChatGPT for Academic and Administrative Tasks" Andreas S. Brendlin @cxbln.bsky.social and Ghizlane Lembarki

@myesr.bsky.social

1 year ago 6 1 0 1
Post image

still thinking about this interaction with Claude (Oct '24) from time to time

1 year ago 0 0 0 0
Post image

r1ing away on an ordinary machine

.. straight out of Kahnemans lesser known book "Thinking .. mostly slow"

1 year ago 2 0 0 0
Preview
Wahlwissen: Briefwahl | Bundesregierung Wer am Wahltag zur Bundestagswahl 2025 verhindert ist, kann vorab per Briefwahl wählen. Alle wichtigen Informationen zur Briefwahl im Überblick.

For Germans unable to vote locally, it is easy to partake, here's how: www.bundesregierung.de/breg-de/schw...

1 year ago 1 0 0 0
Post image
1 year ago 1 0 0 0
Post image

The Illustrated DeepSeek-R1

Spent the weekend reading the paper and sorting through the intuitions. Here's a visual guide and the main intuitions to understand the model and the process that created it.

newsletter.languagemodels.co/p/the-illust...

1 year ago 75 23 1 4
Post image

it's a journey

1 year ago 1 0 0 0
Advertisement

how’d you guess?

1 year ago 0 0 0 0
Preview
Agents Intelligent agents are considered by many to be the ultimate goal of AI. The classic book by Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall, 1995), defines ...

another excellent blog post by @chiphuyen.bsky.social, this time on #agents

huyenchip.com/2025/01/07/a...

1 year ago 1 0 0 0
Post image

People: please don't ML for the sake of ML.

I keep seeing manuscripts using fancy machine learning on brain-imaging data, where, in my opinion (having processed a lot of brain-imaging data), the method is way too complex for the richness of the data.

Fancier is not better per se

1 year ago 89 15 5 2

my wl:
- fewer low-hanging proof-of-concept studies (yes, it works for your specialty/organ/classification, too)
- fewer head-to-head model comparisons (yes, llama 3.1 was better than llama 2 for your case, but peer review took 8 months, now there’s llama 4)

- instead: RCTs, relevant outcomes

1 year ago 8 2 1 0
Preview
The promise and perils of synthetic data Is it possible for an AI to be trained just on data generated by another AI? It might sound like a harebrained idea. But it’s one that’s been around for quite some time — and as new, real data is increasingly hard to come by, it’s been gaining traction.…

The promise and perils of synthetic data

1 year ago 28 12 1 0

thanks for sharing

1 year ago 0 0 0 0

"productization requires [standardization in development and deployment], which is antithetical to research. [...] PhDs are supposed to come up with innovative ideas, validate these ideas, report the findings to the community by writing papers and then move on."

[slightly edited to fit post limits]

1 year ago 3 0 0 0
Post image

A market research team's dream: gathering real-world use cases and improvement opportunities directly from natural product usage

from: www.anthropic.com/research/clio

1 year ago 2 0 0 0
Advertisement
Post image

NEW PREPRINT

A detailed overview of 32 popular predictive performance metrics for prediction models

arxiv.org/abs/2412.10288

1 year ago 192 65 11 6
Video
1 year ago 1 0 0 0

this method doesn’t rely on the pin 📌 emoji so it’s more private. instructions: bsky.app/profile/luci...

1 year ago 0 1 0 0