RLC2026 Call for Workshops!
We’re already live: openreview.net/group?id=rl-...
Here's the opportunity to help shape the conference & spotlight your own RL focus areas.
Call: rl-conference.cc/call_for_wor...
Deadline: Mar 12 (AoE)
And don't forget the awesome banquet :) www.cirquedusoleil.com/ludo
Posts by Brad Knox
RLC'26 is inviting submissions!
Please mark your calendar, looking forward to all the new ideas.
Happy holidays!
@eugenevinitsky.bsky.social @audurand.bsky.social @schaul.bsky.social @glenberseth.bsky.social @sologen.bsky.social @pcastr.bsky.social @bradknox.bsky.social @cvoelcker.bsky.social
Hi RL Enthusiasts!
RLC is coming to Montreal, Quebec, in the summer: Aug 16–19, 2026!
Call for Papers is up now:
Abstract: Mar 1 (AOE)
Submission: Mar 5 (AOE)
Excited to see what you’ve been up to - Submit your best work!
rl-conference.cc/callforpaper...
Please share widely!
We're a couple of months into this exciting initiative, AHOI, and we've had two great speakers so far: Joe Carlsmith and Ryan Lowe. Check out our website for future speakers, videos of past talks, and our email list.
liberalarts.utexas.edu/news/ai-huma...
Our paper, Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners, won the Outstanding Paper Award on Emerging Topics in Reinforcement Learning this year at RLC! Congrats to 1st author @cmuslima.bsky.social!
Paper: sites.google.com/ualberta.ca/...
Propose some socials for RLC! Research topics, affinity groups, niche interests, whatever comes to mind!
rl-conference.cc/call_for_soc...
To do non-LLM RLHF research with real, non-author HUMAN data, last I checked there were few datasets available. Synthetic data is usually quite unrealistic, compromising your results (see bradknox.net/human-prefer...).
Our dataset (and code) with real humans: dataverse.tdl.org/dataset.xhtm...
Cursor + Claude pretending to do work.
Vibecoding apparently requires a magic touch I lack. In two attempts from scratch, Cursor AI + Claude 3.5 goes off the rails constantly and has eventually degenerated into non-functionality.
Degeneration #2: Claude is only pretending to run terminal commands and edit my code. 🤦
Giving such shorter-horizon feedback does tend to result in more varied reward rewards. And this variation bears resemblance to the meaning of the word dense, which I suspect is the origin of this misnomer. (4/n)
I find that what people really mean by "dense" is that so-called denser reward functions are giving feedback on *recent* state-action pairs, thus reducing the credit assignment problem (at some risk of misalignment). (3/n)
In standard RL, all reward functions give reward at every time step. A reward of 0 is informative, as is a reward of -1. So all reward functions are dense. (2/n)
Calling a reward function dense or sparse is a misnomer, AFAICT. (1/n)
RLC Keynote speakers: Leslie Kaelbling, Peter Dayan, Rich Sutton, Dale Schuurmans, Joelle Pineau, Michael Littman
Some extra motivation for those of you in RLC deadline mode: our line-up of keynote speakers -- as all accepted papers get a talk, they may attend yours!
@rl-conference.bsky.social
Announcement of Richard S. Sutton as RLC 2025 keynote speaker
Excited to announce the first RLC 2025 keynote speaker, a researcher who needs little introduction, whose textbook we've all read, and who keeps pushing the frontier on RL with human-level sample efficiency
Work led by Stephane Hatgis-Kessell, in collaboration with @reniebird.bsky.social, @scottniekum.bsky.social, Peter Stone, and me. Full paper: arxiv.org/pdf/2501.06416
Our findings suggest that human training and preference elicitation interfaces are essential tools for improving alignment in RLHF. The interventions of studies 2 and 3 can be applied for real-world application and suggest fundamentally new methods for model alignment (8/n)
Study 3: Simply changing the question asked during preference elicitation. (7/n)
Study 2: Training people to follow a specific preference model. (6/n)
Study 1 intervention: Show humans the quantities that underlie a preference model---normally unobservable information derived from the reward function. (5/n)
Every study involved a control and two intervention conditions that attempted to influence humans towards a preference model. For all 3 studies each intervention: (1) significantly influenced humans toward a preference model (2) led to learning more aligned reward functions (4/n)
We answer this question with 3 human studies. Without trying to alter the human's unobserved reward function, we change how humans use this reward function to generate preferences so that they better match the preference model assumed by an RLHF algorithm. (3/n)
RLHF algorithms typically require a model of how humans generate preferences. But a poor model of humans risks learning a poor approximation of the human’s unobservable reward function. We ask: *Can we influence humans to more closely conform to a desired preference model?* (2/n)
First page of the paper Influencing Humans to Conform to Preference Models for RLHF, by Hatgis-Kessell et al.
Our proposed method of influencing human preferences.
RLHF algorithms assume humans generate preferences according to normative models. We propose a new method for model alignment: influence humans to conform to these assumptions through interface design. Good news: it works!
#AI #MachineLearning #RLHF #Alignment (1/n)