Roy Fox (@royf.org) Bsky

Shlomo Zilberstein (2025 Autonomous Agents Research Award) - ACM SIGAI The selection committee for the ACM/SIGAI Autonomous Agents Research Award is pleased to announce that Professor Shlomo Zilberstein is the recipient of the 2025 award. Shlomo Zilberstein is Professor…

This year's ACM/SIGAI Autonomous Agents Research Award goes to Prof. Shlomo Zilberstein. His work on decentralized Markov Decision Processes laid the foundation for decision-theoretic planning in multi-agent systems and multi-agent reinforcement learning.

sigai.acm.org/main/2025/03...
#SIGAIAward

1 year ago 13 3 0 1

I hear that the other site has been undergoing a Distributed Disinterest in Service attack.

1 year ago 0 0 0 0

I had one who was essentially head of Sales.

1 year ago 1 0 0 0

Constructive

1 year ago 1 0 1 0

Exciting news - early bird registration is now open for #RLDM2025!

🔗 Register now: forms.gle/QZS1GkZhYGRF...

Register now to save €100 on your ticket. Early bird prices are only available until 1st April.

1 year ago 16 15 2 2

to the above, I'd add Offline RL (I start with AWR, then IQL and CQL)

1 year ago 2 0 0 0

2025 is looking to be the year that information-theoretic principles in sequential decision making, finally make a comeback! (at least for me, I know others never stopped.) already 4 very exciting projects, and counting!

1 year ago 3 0 0 0

I received an email from the Department of Energy stating that “DOE is moving aggressively to implement this Executive Order by directing the suspension of [...] DEI policies [...] Community Benefits Plans [... and] Justice40 requirements”.
This probably explains the NSF panel suspensions as well.

1 year ago 2 1 0 0

Screenshot of open roles at Fauna Robotics

Want a job in robotics in New York? faunarobotics.com

1 year ago 28 10 1 0

Quick links to the 2024 reviewed works:
1. bsky.app/profile/royf...
2. bsky.app/profile/royf...
3. bsky.app/profile/royf...
4. bsky.app/profile/royf...
5. bsky.app/profile/royf...

1 year ago 0 0 0 0

Q* Search: Heuristic Search with Deep Q-Networks Efficiently solving problems with large action spaces using A* search has been of importance to the artificial intelligence community for decades. This is because the computation and memory requiremen...

2. Using RL to guide search. We called it Q* before OpenAI made that name famous.
“Q* Search: Heuristic Search with Deep Q-Networks”, by Forest Agostinelli, in collaboration with Shahaf Shperberg, Alexander Shmakov, Stephen McAleer, and Pierre Baldi. PRL @ ICAPS 2024.

1 year ago 3 0 1 0

Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distraction Model-Based Reinforcement Learning (MBRL) has been a powerful tool for visual control tasks. Despite improved data efficiency, it remains challenging to use MBRL to train agents with generalizable per...

1. Using segmentation foundation models to overcome distractions in model-based RL.
“Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distraction”, by Kyungmin Kim, in collaboration with Charless Fowlkes. TAFM @ RLC 2024.

1 year ago 1 0 1 0

Our 2024 research review isn't complete without mentioning 2 workshop papers that preview upcoming publications; I'll leave other things happening as surprises for 2025.

1 year ago 2 0 1 0

RLJ · Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning Reinforcement Learning Journal (RLJ)

Davide Corsi @dcorsi.bsky.social, a rising star in Safe Robot Learning, led this work in collaboration with Guy Amir, Andoni Rodríguez, César Sánchez, and Guy Katz, published in RLC 2024. Not to be confused with Davide's other work in RLC 2024, for which he won a Best Paper Award (see below).

1 year ago 1 0 1 0

If the unsafe state space is small, and the boundary simplification is careful not to expand it much, the result is that we can safely run the policy and only rarely invoke the shield on unsafe states, leading to significant speedup with safety guarantees.

1 year ago 0 0 1 0

The trick is to use offline verification not only to label a policy safe/unsafe, but to label each state safe/unsafe, resp. if the policy's action there satisfies/violates safety constraints. The partition is complex, so we simplify it while guaranteeing no false negatives (no unsafe labeled safe).

1 year ago 0 0 1 0

Online verification can be slow but more useful than offline: it's easier to replace occasional unsafe actions than entire unsafe policies. And unsafe actions are often rare, only reducing optimality a little. But it's costly that we need to run the shield on every action, even if it turns out safe.

1 year ago 0 0 1 0

Given a control policy (say, a reinforcement-learned neural network) and a set of safety constraints, there are 2 ways to verify safety: offline, where the policy is verified to always output safe actions; and online, where a “shield” intercepts unsafe actions and replaces them with safe ones.

1 year ago 0 0 1 0

Verification-Guided Shielding for Deep Reinforcement Learning In recent years, Deep Reinforcement Learning (DRL) has emerged as an effective approach to solving real-world tasks. However, despite their successes, DRL-based policies suffer from poor reliability, ...

Last in our 2024 research review: control with efficient safety guarantees. Formal verification methods are very slow, but here's a cool trick to use them for safe control, with minimal slowdown and provable safety guarantees.

1 year ago 2 1 1 0

Led by the fantastic Armin Karamzade in collaboration with Kyungmin Kim and Montek Kalsi, this work was published in RLC 2024.

1 year ago 1 0 0 0

This method works well for short delays, but gets worse as the WM drifts over longer horizons than it was trained for. For longer delays, our experiments suggest a simpler method that directly conditions the policy on the delayed WM state and the following actions.

1 year ago 0 0 1 0

This suggests several delayed model-based RL methods. Most interestingly, when observations are delayed, we can use the WM to imagine how recent actions could have affected the world state, in order to choose the next action.

1 year ago 0 0 1 0

But real-world control problems are often partially observable. Can we use the structure of delayed POMDPs? Recent world modeling (WM) methods have a cool property: they can learn an MDP model of a POMDP. We show that for a good WM of an undelayed POMDP, the delayed WM models the delayed POMDP.

1 year ago 0 0 1 0

Previous works have noticed some important modeling tricks. First, delays can be modeled as just partial observability (POMDP), but generic POMDPs lose the nice temporal structure provided by delays. Second, delayed MDPs are still MDPs, in a larger state space — exponential, but keeps the structure.

1 year ago 0 0 1 0

Reinforcement Learning from Delayed Observations via World Models In standard reinforcement learning settings, agents typically assume immediate feedback about the effects of their actions after taking them. However, in practice, this assumption may not hold true du...

Next up in our 2024 research overview: reinforcement learning under delays. The usual control loop assumes immediate observation and action in each time step, but that's not always possible, as processing observations and decisions can take time. How can we learn to control delayed systems?

1 year ago 2 0 1 0

Led by the tireless Kolby Nottingham, partly during his AI2 internship, in collaboration with Bodhisattwa Majumder, Bhavana Dalvi Mishra, @sameer-singh.bsky.social, and Peter Clark, this work was published in ICML 2024.

1 year ago 0 0 0 0

SSO outperforms Reflexion and ReAct on a NetHack benchmark.

Skill Set Optimization (SSO) achieves record task success rate in ScienceWorld and NetHack benchmarks, compared with existing memory-based language agents (ReAct, Reflexion, CLIN).

1 year ago 0 0 1 0

How to curate a skill set? Keep evicting skills that are rarely used in high-reward interactions. Here we rely on another prompt to tell us which skills it thinks actually informed actions in successful executions. We only keep those.

1 year ago 0 0 1 0

How to learn new skills? Take pairs (or more) of high-reward experiences that follow similar state trajectories and ask a language to describe their shared prototype, i.e. a joint abstraction of their end state + a list of abstract instructions that hint at their actions.

1 year ago 0 0 1 0

Here, skill = abstraction of initial state + subgoal + instructions list. We keep a set of those.
How to use a skill set? Retrieve the most relevant skills for the current state (highest similarity to skill initial state) and put them in context for the language agent.

1 year ago 0 0 1 0

Posts by Roy Fox