Instead of retraining / adapting reasoning models for every domain, we can plug in reward modules to steer reasoning toward higher reliability, which is needed especially in high-stakes settings.
14/14
Posts by Michael Moor
Bigger picture:
π§ reasoning models = general-purpose
π process reward agents = domain-specific grounding modules
β PRA enables a decoupling of how we reason from what we know
13/
Results / Take-aways (more details in paper):
β
Strong improvements on medical reasoning benchmarks
β
Works across multiple frozen policy models
β
Generalizes beyond the model it was paired with
12/
Why this matters:
In knowledge-intensive domains, correctness is not merely logical consistency
β it requires alignment with external knowledge.
PRA proposes a way to bring this into the reasoning & reward loop
11/
This unlocks something important:
π Tree-search reasoning paired with grounded knowledge
Instead of committing to one chain of thought, PRA can:
β explore multiple paths
β retrieve different evidence per step
β evaluate them in real time
10/
Key idea:
PRA turns PRMs into active agents that can:
β’ Check each reasoning step dynamically
β’ Query external knowledge (guidelines, textbooks, etc.)
β’ Provide immediate rewards during generation
9/
Here, we introduce Process Reward Agents (PRA).
A framework that uses PRMs online to search, reward, and guide reasoning as it unfolds.
8/
Grounded (i.e. retrieval-augmented) PRMs typically operate post-hoc:
They score reasoning only after the full trace is generated.
This means:
β no real-time feedback
β no flexible search (e.g. tree exploration)
β limited ability to steer reasoning as it happens
7/
Recent work combines PRMs with retrieval (e.g. Med-PRM):
- pull in external knowledge
- critique reasoning traces step-by-step using external sources
But thereβs a catch π
6/
One promising direction to go beyond final answers: Process Reward Models (PRMs)
Instead of only judging final answers, PRMs evaluate intermediate reasoning steps.
5/
This matters a lot.
In domains like medicine, LLM reasoning is not only about the final answer/decision
-
we urgently need sound and defensible justifications along the way!
4/
But in knowledge-intensive domains:
- step correctness often depends on external knowledge (consensus, guidelines, textbooks, local constraints etc.) spread across various sources
- individual steps may not be easily verifiable in isolation
3/
In math/code, intermediate steps are often locally verifiable
β you can +- easily verify if a step is correct (e.g. formal rules, symbolic solvers, code compilation & execution etc.)
2/
Preprint: arxiv.org/abs/2604.09482
Page: process-reward-agents.github.io
Code: github.com/eth-medical-...
Big thanks to a stellar team of co-authors: Jiwoong Sohn,
Tomasz Sternal, Kenneth Styppa, and Torsten Hoefler!
@ethz.ch
1/
[Preprint Alert] πππππππ ππππππ ππππππ (πππ)
Why is it relatively easy to get LLMs to produce strong reasoning traces in math/codeβ¦
but much harder in application domains like health? And what can we do against it?
Check out our new paper & π§΅below:
Check out Med-PRM, an approach for LLMs to verify each reasoning step against guidelines:
π Page: med-prm.github.io
π Paper: arxiv.org/abs/2506.11474
π§ Model: huggingface.co/dmis-lab/lla...
π Dataset: huggingface.co/datasets/dmi...
π» Code: github.com/eth-medical-...
π§΅ Thread: tinyurl.com/yu933dx6
Welcome to our new lab page π
bsse.ethz.ch/mail
Great to see this out!
#AI agent labs are becoming better at producing autonomous research. Still, they operate in isolation w/o improving & interacting.
Here, we introduce ππ ππ§πππ±π’π―, where agent laboratories can upload & download latest research - which accelerates their progress:
Great effort led by Samuel Schmidgall!
Today, our two new faculty members held their inaugural lectures @ethzurich.bsky.social. Basile Wicky | Biomedical Design Lab, presented on designing proteins that interface with life; Michael Moor @michaelmoor.bsky.social | Medical AI Lab, spoke about AI in medicine. Recordings > u.ethz.ch/hQdQl
Today @michaelmoor.bsky.social took a train from Zurich to Innsbruck to kick-off our new Faculty of "AI and Scientific Computing"! π He talked about LLMs and Medical AI Agents. Exciting science and great discussions! Thanks! More info about our faculty: aiscm.i-med.ac.at #ai #scientificcomputing
Let's get paper sharing started here! I'll start:
Interesting new preprint on Multimodal medical preference optimization:
arxiv.org/pdf/2412.06141
@huaxiuyaoml.bsky.social (and others)
How can we build an Al virtual cell that simulates all functions and interactions of a cell? How will it transform research and drive breakthroughs in programmable biology, drug discovery and personalized medicine?
Take a look at our paper in @cellpress.bsky.social!
www.cell.com/cell/fulltex...
UI / UX suggestion for #bluesky:
I would remove the ".bsky.social" string that clutters the app. Like when looking at a list of n accounts, one has to visually ignore this suffix n times.
I suspect that small UI things like this could make a big impact in getting more momentum.
OpenAI coming to Switzerland! Congrats on the new roles!
Hello, world! π€©
There is a new blue animal in town, it can fly but is not a bird #Xodus
Any #NewPI out there who just joined? Happy to connect! π π¦
Corrected link: bsky.app/profile/mich...
Finally figured out how to create a starter pack yay π
go.bsky.app/SNnu3ev
Just added a bunch of folks I could quickly find, far from exhaustive..