CompVis - Computer Vision and Learning LMU Munich (@compvis) Bsky

Video diffusion models learn motion indirectly through pixels.

But motion itself is much lower-dimensional.

We introduce 64× temporally compressed motion embeddings that directly capture scene dynamics.

This enables efficient planning -> 10,000× faster than video models.

🧵👇

6 days ago 12 2 1 2

You don't imagine the future by mentally rendering a movie. You trace how things move -- abstractly, sparsely, step by step.
We built a model that does exactly this. It predicts motion, not pixels -- and it's 3,000× faster than video world models.
Myriad, accepted at
@cvprconference.bsky.social

1 week ago 23 8 2 2

TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training

Finally, Felix will present his work on making diffusion transformer training extremely efficient, going from costing multiple months of rent to less than a single night at a conference hotel!

compvis.github.io/tread/

6 months ago 1 0 0 0

Stefan and Timy will be talking about how we can achieve extremely efficient motion prediction in open-set settings: bsky.app/profile/stef...

6 months ago 1 0 1 0

Pingchuan and Ming will be presenting two works on modeling the evolution of artistic style and disentangled representation learning.

See the following thread for more details bsky.app/profile/pima...

6 months ago 1 0 1 0

Excited to share that we'll be presenting four papers at the main conference at ICCV 2025 this week!

Come say hi in Honolulu!

👋 Pingchuan, Ming, Felix, Stefan, Timy, and Björn Ommer will be attending.

6 months ago 2 1 1 0

🎉 From @elsa-ai.eu: 15 new members join the European Lighthouse on Secure & Safe AI—expanding reach across Europe and deepening ties with the @ellis.eu ecosystem.

Everything you need to know 👉 elsa-ai.eu/elsa-welcome...

6 months ago 3 2 0 0

Fascinating approach — encoding an entire image into a single continuous latent token via self-supervised representation learning.
RepTok 🦎 highlights how compact generative representations can retain both realism and semantic structure.

6 months ago 2 0 0 0

🤔 What happens when you poke a scene — and your model has to predict how the world moves in response?

We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions.

It learns to predict the 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of motion itself 🧵👇

6 months ago 24 8 1 1

𝗖𝗮𝗹𝗹 𝗳𝗼𝗿 𝗳𝘂𝗹𝗹𝘆 𝗳𝘂𝗻𝗱𝗲𝗱 𝗣𝗵𝗗 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝘀: We are offering several PhD positions across our various research areas, open to highly qualified candidates.
‼️ The application portal will be open from 15 October to 14 November 2025.

Find out more: mcml.ai/opportunitie...

6 months ago 6 6 0 1

🎧 ELLIOT on the airwaves!

How do we build open and trustworthy AI in Europe?

🎙️ In a recent radio interview, Luk Overmeire from VRT shared insights on ELLIOT, #FoundationModels and the role of public broadcasters in shaping human-centred AI.

📻 Interview in Dutch: mimir.mjoll.no/shares/JRqlO...

9 months ago 1 1 0 0

"What makes us human in an AI-shaped world?" — At #MCML Munich AI Day 2025, Neil Lawrence explored this question, reminding us of the indivisible human core machines can't replicate.

Björn Ommer followed with insights into how GenAI is commodifying intelligence and reshaping how we use computers.

9 months ago 0 1 0 0

🎉 The ELLIOT project Kick-off Meeting was successfully hosted by CERTH-ITI, in Thessaloniki! 🏛️

30 partners from 12 countries 🌍 launched this exciting journey to advance open, trustworthy AI and #FoundationModels across Europe. 🤖

Stay tuned for more updates on #AIresearch and #TrustworthyAI! 💡

9 months ago 4 3 0 0

CleanDIFT: Diffusion Features without Noise CleanDIFT enables extracting Noise-Free, Timestep-Independent Diffusion Features

🧹 CleanDiFT: Diffusion Features without Noise
@rmsnorm.bsky.social*, @stefanabaumann.bsky.social*, @koljabauer.bsky.social*, @frankfundel.bsky.social, Björn Ommer
Oral Session 1C (Davidson Ballroom): Friday 9:00
Poster Session 1 (ExHall D): Friday 10:30-12:30, # 218
compvis.github.io/cleandift/

10 months ago 8 3 1 0

🎉 Excited to share that our lab has three papers accepted at CVPR 2025!

Come say hi in Nashville!
👋 Johannes, Ming, Kolja, Stefan, and Björn will be attending.

10 months ago 1 2 1 0

📢 ELLIOT is coming! A €25M #HorizonEurope project to develop open, trustworthy Multimodal Generalist Foundation Models, #MGFM, for real-world applications. Starting July, it brings 30 partners from 12 countries to shape Europe’s #AI future.

🔍 Follow for updates on #OpenScience & #FoundationModels.

10 months ago 5 4 0 0

Subject-Specific Concept Control We reveal that certain directions in CLIP text embeddings permit detailed attribute control in text-to-image models.

Continuous Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
@stefanabaumann.bsky.social, Felix Krause, Michael Neumayr, @rmsnorm.bsky.social, Melvin Sevi, @vtaohu.bsky.social, Björn Ommer
P. Sess 3 (ExHall D): Sat 10:30-12:30, #246
compvis.github.io/attribute-co...

10 months ago 2 0 0 0

GitHub - CompVis/diff2flow: [CVPR 2025] Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment [CVPR 2025] Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment - CompVis/diff2flow

Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
@joh-schb.bsky.social*, @mgui7.bsky.social*, @frankfundel.bsky.social, Björn Ommer
Poster Session 6 (ExHall D): Sunday 16:00-18:00, # 208
github.com/CompVis/diff...

10 months ago 2 0 1 0

CleanDIFT: Diffusion Features without Noise CleanDIFT enables extracting Noise-Free, Timestep-Independent Diffusion Features

🧹 CleanDiFT: Diffusion Features without Noise
@rmsnorm.bsky.social*, @stefanabaumann.bsky.social*, @koljabauer.bsky.social*, @frankfundel.bsky.social, Björn Ommer
Oral Session 1C (Davidson Ballroom): Friday 9:00
Poster Session 1 (ExHall D): Friday 10:30-12:30, # 218
compvis.github.io/cleandift/

10 months ago 8 3 1 0

🎉 Excited to share that our lab has three papers accepted at CVPR 2025!

Come say hi in Nashville!
👋 Johannes, Ming, Kolja, Stefan, and Björn will be attending.

10 months ago 1 2 1 0

Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment Diffusion models have revolutionized generative tasks through high-fidelity outputs, yet flow matching (FM) offers faster inference and empirical performance gains. However, current foundation FM mode...

If you are interested, feel free to check the paper (arxiv.org/abs/2506.02221) or come by at CVPR:

📌 Poster Session 6, Sunday 4:00 to 6:00 PM, Poster #208

10 months ago 5 2 0 0

Grand Opening of the AI-HUB@LMU. The AI-HUB@LMU is a platform that for the first time unites all 18 faculties of the #LMU as a joint scientific community.

📅January 29, 2025, 6:00 PM
📍 Große Aula, LMU Munich
Full program here: www.ai-news.lmu.de/grand-openin...

1 year ago 4 1 0 0

Experte: Stehen erst am Beginn der KI-Entwicklung

www.sueddeutsche.de/bayern/kuens...

1 year ago 1 0 0 0

Building a New Foundation Model (Björn Ommer) | DLD25 YouTube video by DLD Conference

www.youtube.com/watch?v=bCy6...

1 year ago 10 2 0 0

Worum es in der KI jetzt geht: Deutschland hat noch Chancen Die Künstliche Intelligenz wird viel verändern. Was jetzt zu tun ist, um nicht Spielball anderer zu werden.

www.faz.net/aktuell/wirt...

1 year ago 3 0 0 0

Künstliche Intelligenz: Experte: Stehen erst am Beginn der KI-Entwicklung Der Informatiker Björn Ommer ist bekannt für seine bahnbrechenden Arbeiten im Bereich der KI. Auf der Konferenz DLD sagt er große Auswirkungen durch KI auch für kleinere Unternehmen voraus.

www.stuttgarter-nachrichten.de/inhalt.kuens...

1 year ago 1 0 0 0

Attending my first corporate-sponsored business conference: there’s a live band playing between talks to keep the energy up.

Meanwhile, academic conferences are struggling to afford coffee breaks. Want this for EPSA!

@compvis.bsky.social

1 year ago 8 1 1 0

bsky.app/profile/pima...

1 year ago 2 0 0 0

Our method pipeline

🤔When combining Vision-language models (VLMs) with Large language models (LLMs), do VLMs benefit from additional genuine semantics or artificial augmentations of the text for downstream tasks?

🤨Interested? Check out our latest work at #AAAI25:

💻Code and 📝Paper at: github.com/CompVis/DisCLIP

🧵👇

1 year ago 15 8 1 0

Posts by CompVis - Computer Vision and Learning LMU Munich