Aleksander Hołyński (@holynski) Bsky

We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io

1 year ago 24 9 1 0

Introducing MegaSaM!

Accurate, fast, & robust structure + camera estimation from casual monocular videos of dynamic scenes!

MegaSaM outputs camera parameters and consistent video depth, scaling to long videos with unconstrained camera paths and complex scene dynamics!

1 year ago 68 18 1 4

I know, it's hard to believe. But this thing really works.

Check out the website, there are a couple dozen interactive results and over 80 video examples in the gallery. No cherry-picking here.

mega-sam.github.io

1 year ago 1 0 0 0

I love SfM, but it's way less useful than it should be because of a handful of characteristic failures.

@zhengqi_li's new paper basically solves them all:

-No parallax? ✅
-No calibration? ✅
-Dynamic scenes? ✅
-Dense geometry? ✅

Best of all, it's super fast.

1 year ago 10 0 1 0

This is my favorite kind of social media content.

1 year ago 1 0 0 0

🎥 Introducing MultiFoley, a video-aware audio generation method with multimodal controls! 🔊
We can
⌨️Make a typewriter sound like a piano 🎹
🐱Make a cat meow like a lion roars! 🦁
⏱️Perfectly time existing SFX 💥 to a video.

arXiv: arxiv.org/abs/2411.17698
website: ificl.github.io/MultiFoley/

1 year ago 42 12 2 6

Quark is out!
Come check our work on generalized realtime 3D reconstruction.
quark-3d.github.io

PS: We're looking for interns!

1 year ago 20 2 0 1

Stop watching videos, start interacting with worlds.

Stoked to share CAT4D, our new method for turning videos into dynamic 3D scenes that you can move through in real-time!
cat-4d.github.io
arxiv.org/abs/2411.18613

1 year ago 90 14 2 5

Check out CAT4D: our new paper that turns (text, sparse images, videos) => (dynamic 3D scenes)!

I can't get over how cool the interactive demo is.

Try it out for yourself on the project page: cat-4d.github.io

1 year ago 63 14 1 1

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel vie...

We just dropped CAT4D, text to dynamic 3D models that you can render in real time. Not posting a video because Bluesky is garbage in this respect; go straight to the real time viewer on a desktop browser and look around. The cat kneading dough is my favorite.
cat-4d.github.io

1 year ago 114 11 3 1

Our group at Google DeepMind is now accepting intern applications for summer 2025. Attached is the official "call for interns" email; the links and email aliases that got lost in the screenshot are below.

1 year ago 96 26 3 1

Posts by Aleksander Hołyński