Advertisement · 728 × 90

Posts by Daniel Brown

This work was led by my PhD student Connor Mattson with Varun Raveendra, Ellen Novoseller, Nicholas Waytowich, and Vernon J. Lawhern.

10/N

4 weeks ago 0 1 0 0

Our paper highlights a new method for teaching robots that is designed around what humans can feasibly demonstrate. Lots of exciting follow up work to come in this area!

9/N

4 weeks ago 0 1 1 0
Post image

- On physical robots, R2BC outperforms centralized BC by 3.25x and 5.9x on two cooperative tasks.
- The variability introduced by autonomous teammates during training acts as a natural form of data augmentation, making R2BC policies more robust.

8/N

4 weeks ago 0 1 1 0

Some of our exciting findings indicate that:
- R2BC matches or outperforms a privileged joint-action behavior cloning baseline across four simulated tasks, despite never seeing a single centralized demonstration.

7/N

4 weeks ago 0 1 1 0
Post image

Not only does this work, our method is simpler and better than giving an "oracle" demonstrator control of the whole team simultaneously in an offline setting.

6/N

4 weeks ago 0 1 1 0

In our method, Round-Robin Behavior Cloning (R2BC), the human controls one agent while the others run their current learned policies. Then, the demonstrator rotates and teaches the next robot. This process repeats while agents periodically update their policies with their demos.

5/N

4 weeks ago 0 1 1 0

We tackled the following question: How can we teach robot teams if a human can only provide demonstrations to one robot at a time?

4/N

4 weeks ago 0 1 1 0

Problem: Teaching a team of robots by demonstration is hard. A single human cannot teleoperate many robots at once and expect any of them to learn something useful. However, prior work has unrealistically assumed that a single human can provide these joint demos to agents!

3/N

4 weeks ago 0 1 1 0
Advertisement
Preview
R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relati...

R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

Full Paper: arxiv.org/abs/2510.18085
Code, and Videos: sites.google.com/view/r2bc/home

2/N

4 weeks ago 0 1 1 0
Post image

How can you do imitation learning for multi-agent systems if the demonstrator is just a single human? Unless you're a doctor octopus genius, this is really hard! We study this problem in a new paper that will be at ICRA'26!

1/N

4 weeks ago 2 0 1 0
Assistant Professor for The Kahlert School of Computing

The Kahlert School of Computing at the University of Utah is hiring for multiple faculty positions! We're especially interested in growing in the areas of human-centered AI and robotics! utah.peopleadmin.com/postings/190...

5 months ago 0 0 0 0

We hope this work can help inspire the development of better AI alignment tests and evaluations for LLM reward models.

Check out the workshop paper here: anamarasovic.com/publications...

8/8

6 months ago 3 0 0 0

We applied this approach to RewardBench and found evidence that much of the data in safety and reasoning datasets may be redundant (44% for safety and 24% for reasoning) and that this can lead to inflated alignment scores.

7/8

6 months ago 0 0 1 0

By scaling up these ideas to LLMs, we can now estimate the set of reward model weights (weights that map the last decoder hidden state to a scalar output) that are consistent with a preference alignment dataset and also identify redundant and non-redundant examples in the preference dataset.

6/8

6 months ago 0 0 1 0

Once you find these core demonstrations or comparisons you can use them to craft efficient alignment tests. But until recently, we were only able to empirically test these ideas on simple toy domains.

5/8

6 months ago 0 0 1 0
Advertisement
Post image

The main idea was that for linear rewards, we can determine, via an intersection of half-spaces, the set of reward functions that make a policy optimal and that this set of rewards is defined by a small number of "non-redundant" demonstrations or comparisons.

4/8

6 months ago 0 0 1 0
Post image

It was a fun paper and has some interesting nuggets, like the fact that there exist sufficient conditions under which we can verify exact and approximate AI alignment across an infinite set of deployment environments via a constant-query-complexity test.

3/8

6 months ago 2 0 1 0
Preview
Value Alignment Verification As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent's performance and correctness. In th...

As some background, a couple of years ago I worked with
Jordan Schneider, @scottniekum.bsky.social, and Anca Dragan on what we called "Value Alignment Verification" with the goal of efficiently testing whether an AI system is aligned with human values.
arxiv.org/abs/2012.01557

2/8

6 months ago 2 0 1 0
Post image

Can you trust your reward model alignment scores?
New work presented today at the COLM Workshop on Socially Responsible Language Modelling Research led by Purbid Bambroo and in collaboration with @anamarasovic.bsky.social that probes LLM preference test sets for redundancy and inflated scores.

1/8

6 months ago 2 1 1 0
Agreement Volatility: A Second-Order Metric for Uncertainty... Autonomous surgical robots are a promising solution to the increasing demand for surgery amid a shortage of surgeons. Recent work has proposed learning-based approaches for the autonomous...

This was a really fun collaboration with Jordan Thompson, Britton Jordan, and Alan Kuntz.

Check out our paper here: openreview.net/forum?id=K7K...

5/5

6 months ago 0 0 0 0

Our approach also enables uncertainty attribution! We can backpropagate uncertainty estimates into an input point cloud to visualize and interpret the robot's uncertainty.

If you're at #CoRL25, check out Jordan Thompson's talk and poster (Spotlight 6 & Poster 3).

4/5

6 months ago 0 0 1 0
Post image

We apply our approach to surgically-inspired deformable tissue manipulation and find it achieves a 10% lower reliance on human interventions compared to prior work that leverages variance-based uncertainty estimates.

3/5

6 months ago 0 0 1 0

Inspired by prior work on active, uncertainty-aware human-robot hand-offs like Ryan Hoque and @ken-goldberg.bsky.social's ThriftyDAgger (arxiv.org/abs/2109.08273), we show that agreement volatility enables robots to know when they need help so they can request appropriate human interventions.

2/5

6 months ago 0 0 1 0
Post image

Check out our new paper being presented today at #CoRL2025 on uncertainty quantification: openreview.net/forum?id=K7K....
We propose a new second-order metric for uncertainty quantification in robot learning that we call "Agreement Volatility."

1/5

6 months ago 0 0 1 0
Advertisement
Preview
Toward robust, interactive, and human‐aligned AI systems Ensuring that AI systems do what we, as humans, actually want them to do is one of the biggest open research challenges in AI alignment and safety. My research seeks to directly address this challeng...

Excited to announce that my lab's research was recently highlighted in an AI Magazine article: onlinelibrary.wiley.com/doi/pdf/10.1...

6 months ago 0 0 0 0
Post image

If you're in Melbourne, come check out Connor's talk in the Teleoperation and Shared Control session today!

Paper: arxiv.org/abs/2501.08389
Website: sites.google.com/view/zerosho...

This is joint work with two of my other amazing PhD students Zohre Karimi and Atharv Belsare!

3/3

1 year ago 0 0 0 0
Post image

We study how to enable robots to use end-effector vision to estimate zero-shot human intents in conjunction with blended control to help humans accomplish manipulation tasks like grocery shelving with unknown and dynamically changing object locations.

2/3

1 year ago 0 0 1 0
Post image

Shared autonomy systems have been around for a long time but most approaches require a learned or specified set of possible human goals or intents. I'm excited for my student Connor Mattson to present our work at #HRI2025 on a zero-shot, vision-only shared autonomy (VOSA) framework.

1/3

1 year ago 1 0 1 0

Excited to announce that I've been invited to give a talk at AAAI-25 on "Leveraging Human Input to Enable Robust, Interactive, and Aligned AI Systems" as part of their New Faculty Highlights program!

1 year ago 5 0 1 0