Advertisement · 728 × 90

Posts by Rishabh Kabra

Post image

The sandwich technique came up again. So I decided to frame it properly

4 months ago 0 0 0 0

I had a score disappear even when the reviewer said they will maintain their score. So it has likely nothing to do with whether the score changed.

8 months ago 4 0 0 0
Scaling 4D Representations

Scaling 4D Representations

Scaling 4D Representations

Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.

Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d

9 months ago 20 8 0 0
Veo 3: Celebrating festival season
Veo 3: Celebrating festival season YouTube video by Google UK

Veo 3 goes to Glastonbury:

www.youtube.com/watch?v=aKkr...

@googleuk.bsky.social

9 months ago 1 0 0 0
Video vs. image diffusion representations

Video vs. image diffusion representations

Feature visualization for image and video diffusion

Feature visualization for image and video diffusion

Generative Video Diffusion: does a model trained with this objective learn better features compared to image generation?

We investigated this question and more in our latest work, please check it out!

*From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001

1 year ago 6 2 0 0
Moving Off-the-Grid: Scene-Grounded Video Representations Moving Off-the-Grid: Scene-Grounded Video Representations

A self-supervised video representation model that allows visual tokens to move “off-the-grid” to represent scene elements consistently as they move across the image plane. We evaluate on downstream tasks including point tracking, monocular depth estimation, and object tracking.

moog-paper.github.io

1 year ago 1 0 0 0
Video

*Moving Off-the-Grid: Scene-Grounded Video Representations*.

Thursday afternoon poster.

1 year ago 1 0 1 0
Neural Assets Neural Assets

We learn per-object tokens (Neural Assets) that disentangle appearance and 3D pose from multi-object scenes. A sequence-of-tokens format allows us to reuse the text-to-image architecture of existing generative models.

neural-assets-paper.github.io

1 year ago 1 0 1 0
Video

*Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models*.

Thursday morning poster.

1 year ago 1 0 1 0
Advertisement

I’m hanging out at NeurIPS this week. Come check out my co-authors’ presentations of the following Spotlight papers!

1 year ago 1 0 1 0