Lukas Thede (@lukasthede) Bsky

🎉 Presenting at #ICML2025 tomorrow!
Come and explore how representational similarities behave across datasets :)

📅 Thu Jul 17, 11 AM-1:30 PM PDT
📍 East Exhibition Hall A-B #E-2510

Huge thanks to @lorenzlinhardt.bsky.social, Marco Morik, Jonas Dippel, Simon Kornblith, and @lukasmut.bsky.social!

9 months ago 9 3 0 0

Understanding the Limits of Lifelong Knowledge Editing in LLMs Keeping large language models factually up-to-date is crucial for deployment, yet costly retraining remains a challenge. Knowledge editing offers a promising alternative, but methods are only tested o...

📝 Paper: arxiv.org/abs/2503.05683
🗂️ Dataset: huggingface.co/datasets/luk...
💻 Code: github.com/ExplainableM...

9 months ago 0 0 0 0

🚨 Poster at #ICML2025!
How can LLMs really keep up with the world?

Come by E-2405 on July 15th (4:30–7:00pm) to check out WikiBigEdit – our new benchmark to test lifelong knowledge editing in LLMs at scale.

🔗 Real-world updates
📈 500k+ QA edits
🧠 Editing vs. RAG vs. CL

9 months ago 5 1 1 0

Postdoctoral Researcher (m/f/d, E13 TV-L, 100%)

🧠🤖 We’re hiring a Postdoc in NeuroAI!

Join CRC1233 "Robust Vision" (Uni Tübingen) to build benchmarks & evaluation methods for vision models, bridging brain & AI. Work with top faculty & shape vision research.

Apply: tinyurl.com/3jtb4an6

#NeuroAI #Jobs

9 months ago 17 13 0 1

📢 Landed in Nashville🎺 for #CVPR2025! The EML group is presenting 4 exciting papers — come say hi at our poster sessions! More details in the thread — see you there! 🏁🌟

10 months ago 7 2 1 0

🚨 Happy to announce that one paper, "Understanding the Limits of Lifelong Knowledge Editing in LLMs", is accepted at #icml2025 ! Congrats to @lukasthede.bsky.social , @confusezius.bsky.social , Matthias Bethge, @zeynepakata.bsky.social , and @tomhartvigsen.bsky.social . 👇 Highlights in the thread

11 months ago 10 2 1 1

🎓PhD Spotlight: Jae Myung Kim

We’re thrilled to celebrate Jae Myung Kim, who will defend his PhD on 25th June! 🎉

Jae Myung began his PhD at @unituebingen.bsky.social as part of the ELLIS & IMPRS-IS programs, advised by @zeynepakata.bsky.social and collaborating closely with Cordelia Schmid.

11 months ago 12 2 1 1

We’ve landed in Singapore for #ICLR2025!
The EML group is presenting 4 exciting papers — come say hi at our poster sessions! 👇Let's chat!

More details in the thread — see you there! 🌟

11 months ago 5 2 1 0

10/
This project was a joint effort with amazing collaborators:
👥 @confusezius.bsky.social , Matthias Bethge, @zeynepakata.bsky.social , and @tomhartvigsen.bsky.social
Huge thanks to them for the ideas, feedback, and countless hours that made this work possible. 🙏

1 year ago 4 0 0 0

Understanding the Limits of Lifelong Knowledge Editing in LLMs Keeping large language models factually up-to-date is crucial for deployment, yet costly retraining remains a challenge. Knowledge editing offers a promising alternative, but methods are only tested o...

9/
📘 Want to test your method at scale?
📄 Paper: arxiv.org/abs/2503.05683
🗂️ Benchmark: huggingface.co/datasets/luk...
💻 Code: github.com/ExplainableM...
Let’s build LLMs that truly stay up to date. 🔄
Excited to see what the community does with this!

1 year ago 5 1 1 0

8/
🔍 TL;DR:
✅ We release WikiBigEdit - a new large-scale benchmark for real-world factual updates
🚨 Existing editing methods fail to scale
💡 Finetuning + merging is a surprisingly strong baseline
🧩 RAG wins - but with trade-offs

1 year ago 0 0 1 0

7/
Surprisingly, simple continual finetuning (LoRA) outperforms all editing baselines - at equal inference cost.
And when paired with model merging, performance improves even further over time.
💪 More scalable, more robust, and better retention across time steps.

1 year ago 0 0 1 0

6/
RAG performs best overall - nearly tripling accuracy on edit and generalization tasks.
But:
⏳ It comes with significantly higher inference cost
🔄 And still struggles with multi-hop reasoning over updated facts

1 year ago 1 0 1 0

5/
The result? 📉
Most editing methods struggle at scale.
ROME and MEMIT collapse within a few hundred updates.
Even WISE, built for lifelong edits, degrades quickly - converging to pre-edit performance.
➡️ These techniques aren’t yet ready for real-world demands.

1 year ago 1 0 2 0

4/
We put popular editing methods to the test:
🔧 ROME, MEMIT, WISE
🔁 LoRA finetuning & merging
🔍 Retrieval-augmented generation (RAG)

How do they stack up on update accuracy, reasoning, generalization, and locality?

1 year ago 1 0 1 0

3/
Unlike synthetic edit datasets, WikiBigEdit tracks real-world knowledge changes over time.

It probes multi-hop reasoning, semantic generalization, and whether new edits interfere with existing knowledge.
And it’s built to continuously grow - for future-proof evaluation.

1 year ago 1 0 1 0

2/
📣 Introducing WikiBigEdit: a new benchmark for lifelong knowledge editing.

It includes:
📌 500K+ real-world QA pairs based on Wikidata
📆 8 time steps over 6 months (Feb–Jul 2024) and continuously updatable
🧪 Rich evaluations: reasoning, generalization, locality, …

1 year ago 1 1 1 0

1/
Most LLMs are static snapshots of past knowledge.
But facts change constantly - and retraining is far too costly.
Knowledge editing offers a cheaper fix.
But how far can it actually take us?
We put it to the test - at realistic, deployment-scale.

1 year ago 1 0 1 0

🧠 Keeping LLMs factually up to date is a common motivation for knowledge editing.

But what would it actually take to support this in practice at the scale and speed the real world demands?

We explore this question and really push the limits of lifelong knowledge editing in the wild.
👇

1 year ago 29 8 1 4

Happy to share that we have 4 papers to be presented in the coming #ICLR2025 in the beautiful city of #Singapore . Check out our website for more details: eml-munich.de/publications. We will introduce the talented authors with their papers very soon, stay tuned😉

1 year ago 7 4 0 0

🚨 New paper alert! 🚨
We’ve just launched openretina, an open-source framework for collaborative retina modeling across datasets and species.
A 🧵👇 (1/9)

1 year ago 38 20 1 1

CuratedThoughts: Data Curation for RL Datasets 🚀

Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.

Here's why 👇🧵

1 year ago 13 9 1 1

EVAL-FoMo 2 A Vision workshop on Evaluations and Analysis

🔥 #CVPR2025 Submit your cool papers to Workshop on
Emergent Visual Abilities and Limits of Foundation Models 📷📷🧠🚀✨

sites.google.com/view/eval-fo...

Submission Deadline: March 12th!

1 year ago 3 2 0 1

Hiring announcement: ELLIS Institute Tübingen is looking for ML Researchers & Engineers for Open-Source AI Tutoring (m/f/d). The image features a white background with bold black text and the colorful ELLIS logo at the bottom.

🚀 We’re hiring! Join Bernhard Schölkopf & me at @ellisinsttue.bsky.social to push the frontier of #AI in education!

We’re building cutting-edge, open-source AI tutoring models for high-quality, adaptive learning for all pupils with support from the Hector Foundation.

👉 forms.gle/sxvXbJhZSccr...

1 year ago 9 14 1 1

🚨Great Models Think Alike and this Undermines AI Oversight🚨
New paper quantifies LM similarity
(1) LLM-as-a-judge favor more similar models🤥
(2) Complementary knowledge benefits Weak-to-Strong Generalization☯️
(3) More capable models have more correlated failures 📈🙀
🧵👇

1 year ago 21 9 2 1

EML MunichEML MunichMenu Explainable Machine Learning Munich

Our EML team has 4 #ICLR25 Papers accepted! I am proud of my students and grateful to be a part of many successful collaborations. More details will appear on our website (www.eml-munich.de) but here are the snapshots.

1 year ago 19 2 1 0

How to Merge Your Multimodal Models Over Time? Model merging combines multiple expert models - finetuned from a base foundation model on diverse tasks and domains - into a single, more capable model. However, most existing model merging approaches...

📄 New Paper: "How to Merge Your Multimodal Models Over Time?"

arxiv.org/abs/2412.06712

Model merging assumes all finetuned models are available at once. But what if they need to be created over time?

We study Temporal Model Merging through the TIME framework to find out!

🧵

1 year ago 25 7 1 2

🚨Looking to test your foundation model on an arbitrary and open-ended set of capabilities, not explicitly captured by static benchmarks? 🚨

Check out ✨ONEBench✨, where we show how sample-level evaluation is the solution.

🔎 arxiv.org/abs/2412.06745

1 year ago 18 5 1 2

🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist?

Turns out you can, and here is how: arxiv.org/abs/2411.15099

Really excited to this work on multimodal pretraining for my first bluesky entry!

🧵 A short and hopefully informative thread:

1 year ago 134 24 2 7

🚀New Paper: Active Data Curation Effectively Distills Multimodal Models
arxiv.org/abs/2411.18674

Smol models are all the rage these days & knowledge distillation (KD) is key for model compression!

We show how data curation can effectively distill to yield SoTA FLOP-efficient {C/Sig}LIPs!!
🧵👇

1 year ago 23 6 1 2

Posts by Lukas Thede