Video diffusion models learn motion indirectly through pixels.
But motion itself is much lower-dimensional.
We introduce 64× temporally compressed motion embeddings that directly capture scene dynamics.
This enables efficient planning -> 10,000× faster than video models.
🧵👇
Posts by CompVis - Computer Vision and Learning LMU Munich
You don't imagine the future by mentally rendering a movie. You trace how things move -- abstractly, sparsely, step by step.
We built a model that does exactly this. It predicts motion, not pixels -- and it's 3,000× faster than video world models.
Myriad, accepted at
@cvprconference.bsky.social
Finally, Felix will present his work on making diffusion transformer training extremely efficient, going from costing multiple months of rent to less than a single night at a conference hotel!
compvis.github.io/tread/
Stefan and Timy will be talking about how we can achieve extremely efficient motion prediction in open-set settings: bsky.app/profile/stef...
Pingchuan and Ming will be presenting two works on modeling the evolution of artistic style and disentangled representation learning.
See the following thread for more details bsky.app/profile/pima...
Excited to share that we'll be presenting four papers at the main conference at ICCV 2025 this week!
Come say hi in Honolulu!
👋 Pingchuan, Ming, Felix, Stefan, Timy, and Björn Ommer will be attending.
🎉 From @elsa-ai.eu: 15 new members join the European Lighthouse on Secure & Safe AI—expanding reach across Europe and deepening ties with the @ellis.eu ecosystem.
Everything you need to know 👉 elsa-ai.eu/elsa-welcome...
Fascinating approach — encoding an entire image into a single continuous latent token via self-supervised representation learning.
RepTok 🦎 highlights how compact generative representations can retain both realism and semantic structure.
🤔 What happens when you poke a scene — and your model has to predict how the world moves in response?
We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions.
It learns to predict the 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of motion itself 🧵👇
𝗖𝗮𝗹𝗹 𝗳𝗼𝗿 𝗳𝘂𝗹𝗹𝘆 𝗳𝘂𝗻𝗱𝗲𝗱 𝗣𝗵𝗗 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝘀: We are offering several PhD positions across our various research areas, open to highly qualified candidates.
‼️ The application portal will be open from 15 October to 14 November 2025.
Find out more: mcml.ai/opportunitie...
🎧 ELLIOT on the airwaves!
How do we build open and trustworthy AI in Europe?
🎙️ In a recent radio interview, Luk Overmeire from VRT shared insights on ELLIOT, #FoundationModels and the role of public broadcasters in shaping human-centred AI.
📻 Interview in Dutch: mimir.mjoll.no/shares/JRqlO...
"What makes us human in an AI-shaped world?" — At #MCML Munich AI Day 2025, Neil Lawrence explored this question, reminding us of the indivisible human core machines can't replicate.
Björn Ommer followed with insights into how GenAI is commodifying intelligence and reshaping how we use computers.
🎉 The ELLIOT project Kick-off Meeting was successfully hosted by CERTH-ITI, in Thessaloniki! 🏛️
30 partners from 12 countries 🌍 launched this exciting journey to advance open, trustworthy AI and #FoundationModels across Europe. 🤖
Stay tuned for more updates on #AIresearch and #TrustworthyAI! 💡
🧹 CleanDiFT: Diffusion Features without Noise
@rmsnorm.bsky.social*, @stefanabaumann.bsky.social*, @koljabauer.bsky.social*, @frankfundel.bsky.social, Björn Ommer
Oral Session 1C (Davidson Ballroom): Friday 9:00
Poster Session 1 (ExHall D): Friday 10:30-12:30, # 218
compvis.github.io/cleandift/
🎉 Excited to share that our lab has three papers accepted at CVPR 2025!
Come say hi in Nashville!
👋 Johannes, Ming, Kolja, Stefan, and Björn will be attending.
📢 ELLIOT is coming! A €25M #HorizonEurope project to develop open, trustworthy Multimodal Generalist Foundation Models, #MGFM, for real-world applications. Starting July, it brings 30 partners from 12 countries to shape Europe’s #AI future.
🔍 Follow for updates on #OpenScience & #FoundationModels.
Continuous Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
@stefanabaumann.bsky.social, Felix Krause, Michael Neumayr, @rmsnorm.bsky.social, Melvin Sevi, @vtaohu.bsky.social, Björn Ommer
P. Sess 3 (ExHall D): Sat 10:30-12:30, #246
compvis.github.io/attribute-co...
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
@joh-schb.bsky.social*, @mgui7.bsky.social*, @frankfundel.bsky.social, Björn Ommer
Poster Session 6 (ExHall D): Sunday 16:00-18:00, # 208
github.com/CompVis/diff...
🧹 CleanDiFT: Diffusion Features without Noise
@rmsnorm.bsky.social*, @stefanabaumann.bsky.social*, @koljabauer.bsky.social*, @frankfundel.bsky.social, Björn Ommer
Oral Session 1C (Davidson Ballroom): Friday 9:00
Poster Session 1 (ExHall D): Friday 10:30-12:30, # 218
compvis.github.io/cleandift/
🎉 Excited to share that our lab has three papers accepted at CVPR 2025!
Come say hi in Nashville!
👋 Johannes, Ming, Kolja, Stefan, and Björn will be attending.
If you are interested, feel free to check the paper (arxiv.org/abs/2506.02221) or come by at CVPR:
📌 Poster Session 6, Sunday 4:00 to 6:00 PM, Poster #208
Grand Opening of the AI-HUB@LMU. The AI-HUB@LMU is a platform that for the first time unites all 18 faculties of the #LMU as a joint scientific community.
📅January 29, 2025, 6:00 PM
📍 Große Aula, LMU Munich
Full program here: www.ai-news.lmu.de/grand-openin...
www.youtube.com/watch?v=bCy6...
Attending my first corporate-sponsored business conference: there’s a live band playing between talks to keep the energy up.
Meanwhile, academic conferences are struggling to afford coffee breaks. Want this for EPSA!
@compvis.bsky.social
bsky.app/profile/pima...
Our method pipeline
🤔When combining Vision-language models (VLMs) with Large language models (LLMs), do VLMs benefit from additional genuine semantics or artificial augmentations of the text for downstream tasks?
🤨Interested? Check out our latest work at #AAAI25:
💻Code and 📝Paper at: github.com/CompVis/DisCLIP
🧵👇