Advertisement · 728 × 90

Posts by Kolja Bauer

Do we really need pixel generation to model motion? 🤔

We show how directly representing motion in a compact space enables efficient, scalable planning.

10,000× faster than video models, enabling planning and reasoning in open-world and robotics settings.

Check it out ⬇️

1 week ago 6 1 0 0
Post image

You don't imagine the future by mentally rendering a movie. You trace how things move -- abstractly, sparsely, step by step.
We built a model that does exactly this. It predicts motion, not pixels -- and it's 3,000× faster than video world models.
Myriad, accepted at
@cvprconference.bsky.social

1 week ago 23 8 2 2

I’m thrilled to share that I’ll present two first-authored papers at #ICCV2025 🌺 in Honolulu together with @mgui7.bsky.social ! 🏝️
(Thread 🧵👇)

6 months ago 5 3 1 1
Post image

🤔 What happens when you poke a scene — and your model has to predict how the world moves in response?

We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions.

It learns to predict the 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of motion itself 🧵👇

6 months ago 24 8 1 1
Our method pipeline

Our method pipeline

🤔When combining Vision-language models (VLMs) with Large language models (LLMs), do VLMs benefit from additional genuine semantics or artificial augmentations of the text for downstream tasks?

🤨Interested? Check out our latest work at #AAAI25:

💻Code and 📝Paper at: github.com/CompVis/DisCLIP

🧵👇

1 year ago 15 8 1 0

In order to extract features from diffusion models, you have to noise your input and tune the noise level for each downstream task. But isn't there a better way? 🤔

Turns out there is, using our newly proposed feature extraction method CleanDIFT 🧹🚀

Check it out ⬇️

1 year ago 6 0 0 0

Hi, I recently started as an ELLIS PhD student at Björn Ommer's lab. I would be happy to be on the list as well :)

1 year ago 2 0 1 0
Advertisement

After many years, our lab finally has a social media presence at @compvis.bsky.social ! 🥳
Give it a follow, we have some amazing research on generative computer vision coming soon!

1 year ago 19 2 0 0