Advertisement Β· 728 Γ— 90

Posts by Michael Moor

Instead of retraining / adapting reasoning models for every domain, we can plug in reward modules to steer reasoning toward higher reliability, which is needed especially in high-stakes settings.

14/14

1 week ago 0 0 0 0

Bigger picture:
🧠 reasoning models = general-purpose
πŸ“š process reward agents = domain-specific grounding modules
β†’ PRA enables a decoupling of how we reason from what we know

13/

1 week ago 0 0 1 0

Results / Take-aways (more details in paper):
βœ… Strong improvements on medical reasoning benchmarks
βœ… Works across multiple frozen policy models
βœ… Generalizes beyond the model it was paired with

12/

1 week ago 0 0 1 0

Why this matters:
In knowledge-intensive domains, correctness is not merely logical consistency
β†’ it requires alignment with external knowledge.

PRA proposes a way to bring this into the reasoning & reward loop
11/

1 week ago 1 0 1 0

This unlocks something important:

πŸ‘‰ Tree-search reasoning paired with grounded knowledge

Instead of committing to one chain of thought, PRA can:
β†’ explore multiple paths
β†’ retrieve different evidence per step
β†’ evaluate them in real time

10/

1 week ago 0 0 1 0

Key idea:
PRA turns PRMs into active agents that can:
β€’ Check each reasoning step dynamically
β€’ Query external knowledge (guidelines, textbooks, etc.)
β€’ Provide immediate rewards during generation

9/

1 week ago 0 0 1 0

Here, we introduce Process Reward Agents (PRA).

A framework that uses PRMs online to search, reward, and guide reasoning as it unfolds.

8/

1 week ago 0 0 1 0

Grounded (i.e. retrieval-augmented) PRMs typically operate post-hoc:
They score reasoning only after the full trace is generated.

This means:
❌ no real-time feedback
❌ no flexible search (e.g. tree exploration)
❌ limited ability to steer reasoning as it happens
7/

1 week ago 0 0 1 0

Recent work combines PRMs with retrieval (e.g. Med-PRM):
- pull in external knowledge
- critique reasoning traces step-by-step using external sources

But there’s a catch πŸ‘‡
6/

1 week ago 0 0 1 0
Advertisement

One promising direction to go beyond final answers: Process Reward Models (PRMs)

Instead of only judging final answers, PRMs evaluate intermediate reasoning steps.

5/

1 week ago 0 0 1 0

This matters a lot.
In domains like medicine, LLM reasoning is not only about the final answer/decision
-
we urgently need sound and defensible justifications along the way!

4/

1 week ago 0 0 1 0

But in knowledge-intensive domains:

- step correctness often depends on external knowledge (consensus, guidelines, textbooks, local constraints etc.) spread across various sources

- individual steps may not be easily verifiable in isolation

3/

1 week ago 0 0 1 0

In math/code, intermediate steps are often locally verifiable
β†’ you can +- easily verify if a step is correct (e.g. formal rules, symbolic solvers, code compilation & execution etc.)

2/

1 week ago 0 0 1 0
Preview
Process Reward Agents for Steering Knowledge-Intensive Reasoning Reasoning in knowledge-intensive domains remains challenging as intermediate steps are often not locally verifiable: unlike math or code, evaluating step correctness may require synthesizing clues acr...

Preprint: arxiv.org/abs/2604.09482
Page: process-reward-agents.github.io
Code: github.com/eth-medical-...

Big thanks to a stellar team of co-authors: Jiwoong Sohn,
Tomasz Sternal, Kenneth Styppa, and Torsten Hoefler!
@ethz.ch

1/

1 week ago 0 0 1 0
Post image

[Preprint Alert] ππ‘πŽπ‚π„π’π’ 𝐑𝐄𝐖𝐀𝐑𝐃 𝐀𝐆𝐄𝐍𝐓𝐒 (𝐏𝐑𝐀)

Why is it relatively easy to get LLMs to produce strong reasoning traces in math/code…
but much harder in application domains like health? And what can we do against it?

Check out our new paper & 🧡below:

1 week ago 1 0 1 0
Post image

Check out Med-PRM, an approach for LLMs to verify each reasoning step against guidelines:

πŸ”— Page: med-prm.github.io
πŸ“„ Paper: arxiv.org/abs/2506.11474
🧠 Model: huggingface.co/dmis-lab/lla...
πŸ“š Dataset: huggingface.co/datasets/dmi...
πŸ’» Code: github.com/eth-medical-...
🧡 Thread: tinyurl.com/yu933dx6

10 months ago 2 0 0 0
Post image

Welcome to our new lab page πŸš€
bsse.ethz.ch/mail

11 months ago 2 0 0 0
Advertisement

Great to see this out!

1 year ago 0 0 0 0

#AI agent labs are becoming better at producing autonomous research. Still, they operate in isolation w/o improving & interacting.

Here, we introduce π€π πžπ§π­π‘π±π’π―, where agent laboratories can upload & download latest research - which accelerates their progress:

Great effort led by Samuel Schmidgall!

1 year ago 3 1 0 0
Post image Post image Post image Post image

Today, our two new faculty members held their inaugural lectures @ethzurich.bsky.social. Basile Wicky | Biomedical Design Lab, presented on designing proteins that interface with life; Michael Moor @michaelmoor.bsky.social | Medical AI Lab, spoke about AI in medicine. Recordings > u.ethz.ch/hQdQl

1 year ago 3 2 0 0
Faculty of AI and Scientific Computing in Medicine (AISCM) Medical University of Innsbruck Faculty of AI and Scientific Computing in Medicine

Today @michaelmoor.bsky.social took a train from Zurich to Innsbruck to kick-off our new Faculty of "AI and Scientific Computing"! πŸŽ‰ He talked about LLMs and Medical AI Agents. Exciting science and great discussions! Thanks! More info about our faculty: aiscm.i-med.ac.at #ai #scientificcomputing

1 year ago 2 1 0 0

Let's get paper sharing started here! I'll start:

Interesting new preprint on Multimodal medical preference optimization:
arxiv.org/pdf/2412.06141
@huaxiuyaoml.bsky.social (and others)

1 year ago 1 0 0 0
Post image

How can we build an Al virtual cell that simulates all functions and interactions of a cell? How will it transform research and drive breakthroughs in programmable biology, drug discovery and personalized medicine?

Take a look at our paper in @cellpress.bsky.social!
www.cell.com/cell/fulltex...

1 year ago 89 21 3 3

UI / UX suggestion for #bluesky:

I would remove the ".bsky.social" string that clutters the app. Like when looking at a list of n accounts, one has to visually ignore this suffix n times.

I suspect that small UI things like this could make a big impact in getting more momentum.

1 year ago 2 0 0 0
Advertisement

OpenAI coming to Switzerland! Congrats on the new roles!

1 year ago 5 0 0 0
Post image

Hello, world! 🀩

1 year ago 4 0 0 0
Post image

There is a new blue animal in town, it can fly but is not a bird #Xodus

1 year ago 1 0 0 0

Any #NewPI out there who just joined? Happy to connect! πŸš€ πŸ¦‹

1 year ago 3 0 0 0

Corrected link: bsky.app/profile/mich...

1 year ago 0 0 0 0

Finally figured out how to create a starter pack yay πŸ˜…

go.bsky.app/SNnu3ev

Just added a bunch of folks I could quickly find, far from exhaustive..

1 year ago 8 2 4 1