Advertisement · 728 × 90

Posts by Theresa Eimer

Leibniz, looking at the universe: "Why is there something instead of nothing?"

Me, looking at my Outlook calendar: same

1 month ago 100 17 2 0
Preview
autorl-org/arlbench · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

A WIP, more or less. But here's the matching data: huggingface.co/datasets/aut...

1 month ago 1 0 0 0
eCDF plot von DQN. The curve is pretty bad, 80% of configurations are below 50% max performance.

eCDF plot von DQN. The curve is pretty bad, 80% of configurations are below 50% max performance.

Untunable? Very uncharitable, don't you think? If you look at a DQN hyperparameter eCDF, you clearly see that it can perform well! Just, you know,... incredibly rarely 🤡

1 month ago 1 0 1 0
Video

Meet @theeimer.bsky.social, Postdoc at @unihannover.bsky.social 🇩🇪, ELLIS Member working on AutoRL, hyperparameter optimization & Reinforcement Learning evaluation.

Her challenge at work: bridging #AutoML + RL and sharpening communication to make her work clear to both communities.

#WomenInELLIS

2 months ago 8 2 0 0

Living document means we welcome any discussion and additions! Use the Issues and PRs in the repository to improve this document and hopefully it can be a resource for years to come.

4 months ago 0 0 1 0

A little Christmas present from and for the COSEAL community: a compilation of the best research practices and workflow recommendations in a living document: github.com/coseal/COSEA...

Our goal is to improve research quality in meta-algorithmics and to give new researchers an easier start 💪

4 months ago 0 0 1 0

I'm super fascinated by the randomized results in the talk, though. Could be hard to spot, but basically I tuned PPO evaluation either 1 seed per HP config, 20 seeds or 20 runs with random seed, n_envs, hidden size and activation. The latter performed way better on the default eval and in transfer!

4 months ago 1 0 0 0

...But that's probably a very specific point of view. Seems very difficult to me currently to do evaluations for algorithms that are supposed to solve everything at once if we focus mostly on solution scores or fixed benchmarks.

4 months ago 1 0 1 0
Advertisement

I just gave a talk (aka thinking out loud) at the BeNRL seminar about expressiveness of evaluations. I landed closer to "show people more and potentially weird things" rather than "standardize the setup"...

theeimer.github.io/assets/pdf/s...

4 months ago 5 1 2 0

Foundation models on the AutoML podcast 2/3: are LLMs killing AutoML? It's probably not that simple. Listen for more details 😉

5 months ago 1 0 0 0

Stealing all of the recommendations!

This made me think of The Left Hand Of Darkness, though I guess that's actually almost the opposite, communication bridging a seemingly impossible gap in understanding each other...

5 months ago 3 0 0 0

I fell into a hole, but made it out again with new episodes! This is part one of three of an accidental series on foundation models. The next parts will be released in October and November, so stay tuned!

6 months ago 5 0 0 0

Great opportunity to work with great people. Go apply!

7 months ago 1 0 0 0
AI Allergy I remember being excited about AI. I remember 20 years ago, being excited about neuroevolutionary methods for learning adaptive behaviors in...

New blog post: AI Allergy.

On my increasing disgust with the AI discourse, even though I still like the technical and philosophical. And how I wish I could be excited about AI again.

togelius.blogspot.com/2025/08/ai-a...

8 months ago 92 18 6 6
Post image

It is time

9 months ago 56 6 5 2
Post image Post image

The "reproducibility crisis" in science constantly makes headlines. Repro efforts are often limited. What if you could assess reproducibility of an entire field?

That's what @brunolemaitre.bsky.social et al. have done. Fly immunity is highly replicable & offers lessons for #metascience

A 🧵 1/n

9 months ago 318 173 11 18
Advertisement
Preview
Getting SAC to Work on a Massive Parallel Simulator: Tuning for Speed (Part II) | Antonin Raffin | Homepage This second post details how I tuned the Soft-Actor Critic (SAC) algorithm to learn as fast as PPO in the context of a massively parallel simulator (thousands of robots simulated in parallel).

Need for Speed or: How I Learned to Stop Worrying About Sample Efficiency

Part II of my blog series "Getting SAC to Work on a Massive Parallel Simulator" is out!
I've included everything I tried that didn't work (and why Jax PPO was different from PyTorch PPO)

araffin.github.io/post/tune-sa...

9 months ago 35 8 4 1
Post image

1/2 Offline RL has always bothered me. It promises that by exploiting offline data, an agent can learn to behave near-optimally once deployed. In real life, it breaks this promise, requiring large amount of online samples for tuning and has no guarantees of behaving safely to achieve desired goals.

10 months ago 8 3 1 1

Crazy volume! On the other hand, not that surprising. We also got one of these and only did so because it was such a good deal that even if our complete lack of experience makes research on it hard, we can use it for teaching only, and be okay with spending the money. I doubt we're the only ones!

10 months ago 3 0 0 0
Preview
AutoML School 2025 Scope AutoML has become a cornerstone in the toolkit of many developers and researchers. With the rise of foundation models, AutoML's potential has expanded even further, enabling smarter, more powerf...

📢 Only 3 Weeks to Go!

The AutoML summer school (June 10-13th) is just around the corner, and there is not much time left to register!

---> www.automlschool.org <---

👇 We added several new speakers to the program

10 months ago 7 4 1 0
Preview
I got fooled by AI-for-science hype—here's what it taught me I used AI in my plasma physics research and it didn’t go the way I expected.

Going to the hospital because I broke my wrist smashing the endorse button:
www.understandingai.org/p/i-got-fool...

11 months ago 120 29 6 10

We can only presume to build machines like us once
we see ourselves as machines first.
Abeba Birhane (2022, p. 13)
This is the core. So true.

11 months ago 25 8 2 0
The Future of AVs Panel | 2023 CCAT Symposium | Day 1
The Future of AVs Panel | 2023 CCAT Symposium | Day 1 YouTube video by Center for Connected and Automated Transportation

Panel discussion on the current economic precarity of autonomous vehicle businesses. www.youtube.com/watch?v=gDG-...

"We are at a really tough spot in generating flows of cash right now." 👇

11 months ago 2 1 1 1

After a short era in which people questioned the value of academia in ML, its value is more obvious than ever. Big labs stopped publishing the minute commercial incentives showed up and are relentlessly focused on a singular vision of scaling. Academia is a meaningful complement, bringing...
1/2

1 year ago 212 41 2 2

It's strange to me that the focus of many people's worry is still "superintelligence" and not the reality we're currently living where increasingly authoritarian governments wield technology oppressively.

This fantastical distraction based on speculative rhetoric is increasingly harmful.

1 year ago 23 5 0 0
Advertisement
Preview
Humanoid Robots in Manufacturing Or, there's a reason we don't pull cars with mechanical horses

A sensible perspective on humanoids in manufacturing (TLDR: if you can make humanoids, you can probably make better, more manufacturing specific things)
blog.spec.tech/p/humanoid-r...

1 year ago 58 8 3 2
Post image

Mark your calendars, EWRL is coming to Tübingen! 📅
When? September 17-19, 2025.
More news to come soon, stay tuned!

1 year ago 37 14 0 5
Preview
Llama 4: Did Meta just push the panic button? One of the weirdest releases of the year and understanding the future of the Llama endeavor. For the time being, we have some more amazing open weight models!

Llama 4 was a messy release: unreleased finetunes boosting scores, rumors of training on test, released on a weekend, etc

As (open) models are commoditized / competition grows, what is the role of Meta's Llama efforts in the future? Should they continue?

1 year ago 37 9 1 1

At least there is no need to jailbreak the model anymore 🫠 (Is there a counterpart to make it nicer 🎭?)

1 year ago 3 1 0 0

The school kids visiting me during this year's future day really had hard-hitting questions: "Do you still have a lot of free time?"

Me, a pretty fresh and currently slightly overwhelmed PostDoc: "It's important to be good at time management. Like my colleague, maybe you should ask her."

1 year ago 2 0 0 0