I like this git-diff-style figure a lot.
Posts by Karim Knaebel
Oh no, I love to do that. Counting my days 🥲
Do you also add a short cooldown phase after the constant part? I believe for self-supervised representation learning you don’t want to cool down, but for downstream tasks it usually gives a big boost.
were cvpr reviews ever public since this motion?
I ran the first frame through an (undertrained) internal monocular model. When you zoom in, you can see the tree and street for scale. The clouds still seem too small though; only about 42x the size of the tree in the foreground.
How is it broken? (Genuine question)
Those are awesome, thanks for the inspiration
Congrats!!
So you uv sync -p 3.13 the first time (if you care about the exact version), and then you never need to pass it again. The existing venv “soft pins” its python version until you explicitly request a different version, either via CLI or updated constraints of the python version.
I use it all the time, super useful :) It won’t pin, just makes sure your venv uses this python version if it doesn’t already or doesn’t exist yet.
What’s the reason for pinning in general? I somehow never found a use case for it. If I want a specific version from that range, I provide it in uv sync on demand.
Very good read, thanks for writing this up :)
Does it still happen when you compile before DDP? I always use DDP as the outermost wrapper, there is too much hacky stuff going on in its forward and backward for my liking.
The normal metrics in Tab. 1 also don’t match. I checked only for DSINE, and here the actual metrics are considerably better than in Tab 1. Also missing some sota normal methods. Tab 3 is only a small subset of common benchmarks, and on 1/2 of the datasets it’s worse than both VGGT and MoGe.
Same, I don’t want to use them anymore now, and if I “have to”, then I’m writing them as triple dashes ---. At least semicolons are still safe 👀
Understanding and reconstructing the 3D world are at the heart of computer vision and graphics. At #CVPR2025, we’ve seen many exciting works in 3D vision.
If you're pushing the boundaries, please consider submitting your work to #3DV2026 in Vancouver! (Deadline: Aug. 18, 2025)
In my mind, I often associate instance seg with fancy depth clustering.
I like the look. DINOv2, DUSt3R, CroCo v2, and GB are misspelled if you want to fix those.
I haven’t experienced that yet, but I’m not sure if I’ve tried on that many files. Which one did you try: `dua a` (or just `dua`) will block and finally print, which can take a long time. `dua i` starts an interactive TUI and loads everything in the background while you can already start inspecting.
🔥🔥🔥 CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) 🥐🍾🥖🍷
Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...
Big 🧵👇 with details!
Ah yes, you’re right, thanks for the reference! It’s really just a line in space without explicitly specifying a point on the line as its origin.
The origin is not lost though, it’s just encoded differently. In the end it’s still 6 dims, 3 for the direction and 3 for the origin. A bit unclear to me though what benefit this origin encoding brings.
Their proposed architecture only makes sense for a dense parameterization. I don’t think they ablate with other dense representations though.
Correction: the origin is already encoded in the Plücker ray. I meant that DiffusionSfM encodes /both/ origin and endpoint.
I think the first time I saw Plücker rays was in the Camera as Rays paper arxiv.org/abs/2402.14817. The same authors also recently released DiffusionSfM, where they indeed keep the origins o as well.
Agree, einops or shape in comments is a must. Also, makes it much easier for others to read and understand the code without executing it.
Maybe last one, could you try VGGT single-view on that image for direct comparison? 😁
Very crisp!
True. Does MoGe do a better job at segmenting the sky here?