Still hiring for PhD candidates who are *specifically* excited in building and deploying RL systems for self-driving vehicles and other multi-agent planning settings. Shoot me an email if you think this is you and please help spread the word!
Posts by Claas Voelcker
obligatory
www.youtube.com/watch?v=sg0S...
Another day, another popular researcher who definitely knows your work because you answered his mails!!! announcing a big project without even mentioning your name or your suspiciously similar prior work... About which there were mails...
But of course, Schmidhuber is just a meme...
I’ve done this in tmlr in egregious cases as well, but I wish it was just standard
As a reviewer it should be fine to say "I don't like this paper, I don't care about the problem, please find a better reviewer for it." It is absurdly hard to decouple your own excitement for a problem from your judgment, and we should normalize "hey, I'm not the person who should be judging this"
I have reached peak old German: I am constantly annoyed by the inability of the Tagesschau to decide whether to use “Iran” or “der Iran” and flip-flopping within the same report! Set a policy for Goethe’s sake.
I see great opportunities for proposing new activation functions on the cost function!
The real sign of pain here is this NYC professor voluntarily using Celsius! Look what you have done, AI companies!
We have the keynote speakers for RLC2026 now!
Thrilled to welcome Rika Antonova, Sheila McIlraith, Marc G. Bellemare, Danijar Hafner, Balaraman Ravindran!
Details: rl-conference.cc/index.html
The RL community is coming together this August in Montréal, Québec, Canada. Hope you make it!
I am deeply disappointed that they didn't use Claude Code
I’m gonna use “can I get a fresh context“ next time we derail a conversation
I confess my initial reaction to the OpenAI acquisition of Astral involved stronger language than I’m inclined to use here.
Environmen design seems like the bigger bottleneck for now (reward functions, verifiers, rubrics) which makes focussing on the algorithms hard. Algorithm development flourishes when there are clean benchmarks and goals.
I have been looking into this for a while and I feel like there is a lot of issues with LLM RL that culturally fall far outside the issues that core RL people are used to tackling. Plus most of the buzz is created by closed source labs, which makes it absurdly hard to know what the problems are.
Haha, do you disagree? I feel like language has mostly concluded that the rl algorithm doesn’t matter that much
1. Will probably happen
2. I genuinely don’t know enough about language RL to make very smart comments
This cuts to the heart of the issue: professors have multiple jobs, and some of them were only aligned by accident, not by design. Teaching, funding, and producing research are not necessarily linked, they just happened to be aligned for a while.
Germany does not lack talent, and it does not lack funding. But we are trapping 21st-century minds inside 19th-century academic hierarchies. We are asking brilliant young scientists to build the future of the German economy, but refusing to give them the lab space, the job security, or the scientific independence to actually do it. If we want to reclaim our place as an industrial superpower, we have to stop the rat race of trying to keep every technology and structure alive that made us successful in the 20th century. Instead, we must fix our system that pushes our most ambitious scientists away. The money is there. The talent can be there. Now, we also need the courage to fix what’s broken.
“we are trapping 21st-century minds inside 19th-century academic hierarchies.” This essay gets a lot right about problems with German science. I would add that the hierarchies and precarious contracts lead also to systemic abuse and scientific misconduct. open.substack.com/pub/realimag...
Before any of you recalcitrant cynics come at me, calling @eugenevinitsky.bsky.social "always-wise" is a reflection of his untiring willingness to always give me good feedback on anything and everything, it's not me being sassy 😁
Following advice by the always-wise @eugenevinitsky.bsky.social , I am trying to get back into the habit of blogging (again) ✏️!
Featuring today's post: How to pick an RL algorithm for your problem cvoelcker.de/blog/2026/ch... Please share and give feedback!
#reinforcementlearning
These types of vaccines have been in trials for years and long term the personalized sequencing and rapid development are horrendous bottlenecks. And while people might be ok with waving safety trials for dogs, most won’t be in favor of waving them for grandma…
The dog vaccine story also tells us that putting all of our funding on the AI side might be missing the forest for the trees. AI is insanely cool, but we can’t ignore the physical and social systems around it. ASI won’t cure cancer by itself in silico.
EVERY SINGLE TIME I come to the airport early because of horror stories of crowded lines, I arrive and just walk through an empty security check. I now firmly believe I have mystical powers that clear TSA lines when I arrive early. You are welcome fellow travellers!
✨You am not just right, you are able to anticipate interesting changes with factual accuracy. The mixture of insight with pithy smartness–indicated by your short yet deep reply–will surely win many over in the coming AI debates. ✨
Fear me, ye wise, I figured out how to type an em-dash. By hand-–—!
It isn't just an overused patter, it is an indication of a fundamental shift of vibes.
Basis of comparison really matters :D
DB (past) >> DB (today) >>>>>>>>>>> VIA
My most-used model is Gemini because I’m a Luddite. Not saying Gemini is bad, but its competitive advantage remains search replacement with cross-references, which is the application you care about if you don’t actually trust LLMs.
I will admit: having a husband who can understand your papers, has a master’s in your topic, and can tell you how to read a GPU utilization graph is a real boost 😅