Advertisement · 728 × 90

Posts by Daniel Jiang

Post image

First draft online version of The RLHF Book is DONE. Recently I've been creating the advanced discussion chapters on everything from Constitutional AI to evaluation and character training, but I also sneak in consistent improvements to the RL specific chapter.

rlhfbook.com

1 year ago 122 19 2 3
Post image Post image

At ICLR 2025 in Singapore, my co-authors and I presented two papers on RL. Feel free to let us know of any feedback and let me know if you'd like to chat!
- openreview.net/forum?id=AOl...
- openreview.net/forum?id=AOl...

11 months ago 2 0 0 0
Preview
Postdoctoral Researcher, Monetization (PhD) Meta's mission is to build the future of human connection and the technology that makes it possible.

Topics of interest include offline RL, post-training large language models with RLHF, and long-term recommendation systems. If you’re interested, please email me and/or apply here: www.metacareers.com/jobs/1142270...

1 year ago 2 0 0 0

Our team at Meta is hiring a postdoc researcher! Our group conducts both fundamental and applied research in reinforcement learning, with a focus on applications in Meta's advertising systems.

1 year ago 3 1 1 0
Preview
Turing Award Goes to A.I. Pioneers Andrew Barto and Richard Sutton Andrew Barto and Richard Sutton developed reinforcement learning, a technique vital to chatbots like ChatGPT.

Congrats to this year's Turing award winners! www.nytimes.com/2025/03/05/t...

Incidentally, if you'd like to hear from them, we know a place they've given / are giving keynotes

1 year ago 47 7 0 2
Preview
ASOS Digital Experiments Dataset A novel dataset that can support the end-to-end design and running of Online Controlled Experiments (OCE) with adaptive stopping. Hosted on the Open Science Framework

There’s one from ASOS.com that provides A/B test data over time (across many experiments, each with several arms).

Dataset: osf.io/64jsb/

Paper: arxiv.org/abs/2111.10198

We used it in a paper to benchmark an AE method. But I’d also love to know of other alternatives out there.

1 year ago 8 0 1 0
Post image Post image

Given a high-quality verifier, language model accuracy can be improved by scaling inference-time compute (e.g., w/ repeated sampling). When can we expect similar gains without an external verifier?

New paper: Self-Improvement in Language Models: The Sharpening Mechanism

arxiv.org/abs/2412.01951

1 year ago 41 6 3 0
Preview
Reinforcement Learning: An Overview This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based RL, policy-gradient methods, model-based met...

An updated intro to reinforcement learning by Kevin Murphy: arxiv.org/abs/2412.05265! Like their books, it covers a lot and is quite up to date with modern approaches. It also is pretty unique in coverage, I don't think a lot of this is synthesized anywhere else yet

1 year ago 271 74 9 5

I know one of the organizers is @eugenevinitsky.bsky.social. They did a great job and organized a very enjoyable conference.

1 year ago 1 0 1 0
Advertisement

I collected some folk knowledge for RL and stuck them in my lecture slides a couple weeks back: web.mit.edu/6.7920/www/l... See Appendix B... sorry, I know, appendix of a lecture slide deck is not the best for discovery. Suggestions very welcome.

1 year ago 114 18 3 3
Post image Post image Post image

Want to learn / teach RL? 

Check out new book draft:
Reinforcement Learning - Foundations
sites.google.com/view/rlfound...
W/ Shie Mannor & Yishay Mansour
This is a rigorous first course in RL, based on our teaching at TAU CS and Technion ECE.

1 year ago 154 35 4 4
Post image

New paper: Do social media algorithms shape affective polarization?

We ran a field experiment on X/Twitter (N=1,256) using LLMs to rerank content in real-time, adjusting exposure to polarizing posts. Result: Algorithmic ranking impacts feelings toward the political outgroup! 🧵⬇️

1 year ago 806 213 32 52

The RL (and some non-RL folks) starter pack is almost full. Pretty clear that the academic move here has succeeded
go.bsky.app/3WPHcHg

1 year ago 104 32 12 3