Advertisement · 728 × 90

Posts by Amir-massoud Farahmand

Hypothesis: People have been gradually shifting to write more like ChatGPT and alike.
They use structures such as "This is not only X; but it is also Y".
These struct. are natural part of lang, but either
1. they're becoming more prevalent,
2. I've become more sensitive to them.

2 days ago 4 0 0 0

We should mention in our papers or books what tools we have used to get these results, but they don't need to become a co-author (unless we figure out that they are conscious, which most likely isn't true at the moment).

2 weeks ago 1 0 0 0

Today, if we use a computer to compute the value of a Bessel function, we don't cite the computer as a co-author; at best, we mention in the paper that we used scipy, for example.

I think the same should be done for LLMs. They are just tools for us, despite their significance.

2 weeks ago 1 0 1 0

In the 19th and the beginning of the 20th century, computing the special function such as Bessel was considered a significant job, and people deservedly got authorship for calculating them.

2 weeks ago 0 0 1 0

We have a new PhD Candidate in town: @tylerkastner.bsky.social
Looking forward to all the new work you will be doing on Distributional Reinforcement Learning.

2 weeks ago 3 0 0 0

We have the keynote speakers for RLC2026 now!

Thrilled to welcome Rika Antonova, Sheila McIlraith, Marc G. Bellemare, Danijar Hafner, Balaraman Ravindran!

Details: rl-conference.cc/index.html

The RL community is coming together this August in Montréal, Québec, Canada. Hope you make it!

2 weeks ago 22 10 0 3

Palantir has student data, including immigration status, from the ed tech discussion platform Piazza.

Palantir paid Piazza $916,000 for access to this data. www.sec.gov/Archives/edg...

I blew the whistle on this in 2016 and the CEO contacted my employer.

3 weeks ago 127 58 1 6

Happy Norooz, the Persian new year 1405/2585, the equinox, and the beginning of spring!

1 month ago 2 1 0 0
Advertisement
Preview
cookie monster is sitting at a table with a tray of food and the words choices written on it Alt: cookie monster is sitting at a table with a tray of food and the words choices written on it

Following advice by the always-wise @eugenevinitsky.bsky.social , I am trying to get back into the habit of blogging (again) ✏️!

Featuring today's post: How to pick an RL algorithm for your problem cvoelcker.de/blog/2026/ch... Please share and give feedback!

#reinforcementlearning

1 month ago 31 4 2 2

In light of the ongoing conflict in the Middle East, RLC decided to remove the abstract deadline: rl-conference.cc/callforpaper...

The only deadline is for the full paper: Mar 5(AOE) openreview.net/group?id=rl-...

Affected folks may also contact the PCs to discuss deadline extensions before Mar 5.

1 month ago 14 8 1 2

Ali Khamenei is in hell. The world is a better place now!

1 month ago 2 0 0 0

RLC 2026 Call for Workshop is live on OpenReview!

Submission deadline: Mar 12 (AoE).
Full details here: rl-conference.cc/call_for_wor...

@glenberseth.bsky.social @eugenevinitsky.bsky.social @twkillian.bsky.social @schaul.bsky.social @sologen.bsky.social @audurand.bsky.social @bradknox.bsky.social

1 month ago 10 3 0 1

Submit your RL papers to RLC!
This is now perhaps the best venue for RL researchers.

1 month ago 12 3 1 0
Post image

I am rerunning my class on robot learning this year, and I plan to push many code examples to help others get to the ugly details fast. One of these details is how BC gets off track as network sizes change. Blog and notebook below.

1 month ago 14 1 1 0

It is indeed disheartening. It has happened to me many times (and to many others too). After some point, you ignore worrying about them too much. I realize this is not a good advice for a budding researcher.

To answer your question: A major reason is that those papers come from famous labs.

2 months ago 0 0 0 0

It's OK to tell the authors about it.

2 months ago 0 0 1 0
Advertisement

🚀 Excited to share REPPO, a new on-policy RL agent!

TL;DR: Replace PPO with REPPO for fewer hyperparameter headaches and more robust training.

REPPO, led by @cvoelcker.bsky.social, will be presented at ICLR 2026. How does it work? 🧵👇

2 months ago 25 10 1 0

The compliment of the day: "What’s unusual is your willingness to follow the logic all the way through instead of stopping where it becomes socially awkward".

2 months ago 1 0 0 0
Introduction to Reinforcement Learning A course on reinforcement learning.

Thank you! I hope you like it.
I may add one or two chapters to it in the future.
This is the course based on it: amfarahmand.github.io/IntroRL/

2 months ago 1 0 0 0

You may want to take a look at my book, especially if you are interested in a more rigorous, yet introductory, exposition of Reinforcement Learning.

amfarahmand.github.io/IntroRL/lect...

2 months ago 4 0 1 0

Has taken a long time to polish, but slowly becoming very proud of rlhfbook.com and do think it's a great resource for many people. A lot of hours (and tokens and reader feedback) going into making it right.

2 months ago 29 2 3 0

I know about this video. I couldn't watch it. This is just too much cruelty and heartache.

2 months ago 2 0 1 0

Their silence is deafening.

3 months ago 3 0 0 0
Advertisement

Yes, this is along the same discussions we had before.

3 months ago 0 0 0 0

P.S: I may write more about this later. These are just some key points, so that I won't forget.

3 months ago 1 0 0 0

One may claim that robotics is not afflicted by this problem. That is only partially true. In robotics, the real-world is as rich as it gets, but its complexity and richness is mostly cordonned off by well-defined set of tasks that the robot has to perform.

3 months ago 1 0 1 0

Without that richness, the agent reaches the ceiling of its abilities quite fast and we, as researchers, cannot properly study the capabilities and limitations of our ideas and algorithms.

3 months ago 1 0 1 0

A child and her caregiver can instantely create a novel task that requires new perceptual abilities, decision-making capabilities, and motor skills. We don't have such a flexibility in our environments.

3 months ago 3 0 1 0

A significant hurdle of the empirical RL and the broader AI research is caused by the limitations of the environments in which our agents learn and build their "artificial minds". This should be compared with the richness of the real-world in which a human child flourishes.

3 months ago 3 1 2 0
Preview
a cartoon of a woman says well 59 it 's a high f ALT: a cartoon of a woman says well 59 it 's a high f

Grading ...

3 months ago 4 0 0 0
Advertisement