Hehe I think this one is for @parskatt.bsky.social, it is from DeDoDe, right?
My feeling is that this dark magic is related to gradients growing out of control with constant adding of residuals so you can regularize by sqrt(2), but not sure :)
Posts by David Nordström
Catastophic :). My girlfriend (also went to Chalmers) is furious.
What do you think?
Classic Claude correcting you: "I just went ahead and changed the DINOv3 name to v2 as you must be confusing it".
True :(. Sad MuM.
My final thought is that it is not great to use a frozen encoder trained on pixel reconstruction. It works, but it can be a little wonky. I hope to get some latent multi-view objective working sometime in the future.
We did also try finetuning the encoder in RoMa v2 (similar to UFM) and for this use-case I was very bullish on MuM. However, we experienced training instabilities when finetuning the encoder so we dropped that.
Hehe it is a good point :). Not saving it for another paper (though I am playing around with a MuM v2).
We tried using MuM in RoMa v2 but it did not seem to make a difference. Also, MuM representations, in contrast to DINO, are only good at the last couple of layers.
The days where all new 3dv pap3rs were dust3r extensions are over. All hail the new lord, inference optimization of vggt
It seems, that we have failed the communication about IMC26. Let's try again.
The competition this year is here:
kaggle.com/competitions...
No prizes, but whole year leaderboard -- similar to KITTY and other academic competitions.
3D people, please retweet and share.
True, as mentioned in the other comment, that the activations might be quite heavy from e.g. VGGT. Some learned scene compression might work
Seems to be some major outage, rip
Thank you! Enjoy Brazil
Interesting, that makes sense. Possibly you could train some bottleneck scene representation that forces a compression rather than storing the full VGGT activations.
Thanks. My thinking would be that in many cases it would be fine with an e.g. 10 second mapping stage where you run a forward pass through your multi-view transformer and thereafter you can run lightweight decoding for query images in real-time. For training you could cache this map.
A match made in heaven
Side note: pretty cool you managed to get a reviewer to update from 2 to 8 after a strong rebuttal. Will live on that hopium for ECCV rebuttals...
I guess my main curiosity is: What drove your decision to aggregate the scene information with this patch-mixing with the DUSt3R encoder rather than leveraging an encoding, like VGGT, that is natively multi-view?
I assume you could insert known camera poses into that representation in a similar fashion, i.e. ray encodings
Congratulations, cool approach of mixing patches from the reference images. Did you ever experiment with something like VGGT to encode the scene representation, possibly cache that, and then train the decoder to regress query images given that scene representation?
10 missed calls from Hartley
As usual, this is a collaboration with the Swedish CV special task force, i.e. @parskatt.bsky.social @fredkahl.bsky.social and @bokmangeorg.bsky.social
We aim to continue adding SotA matchers to the LoMa repo. So keep an eye out for that!
IMW paper: arxiv.org/abs/2604.11809
LoMa paper: arxiv.org/abs/2604.04931
LoMa-R is our newest addition to the LoMa family. In our IMW paper at #CVPR26, we investigate rotation invariance in the sparse matching pipeline. The resulting model is robust to rotations, even matching star constellations, and achieves strong upright performance.
github.com/davnords/loma
Sparse image matching is done via 1) keypoint detection in each image, 2) keypoint description, 3) matching of descriptions between images. Should rotation invariance be enforced at stage 2 or 3? Turns out both work fine! To be presented at the CVPR image matching workshop by @davnords.bsky.social
Accepted to the Image Matching Workshop at #CVPR26!
Introducing LoMa-R! A rotation invariant version of LoMa.
Code: github.com/davnords/loma
Paper: arxiv.org/abs/2604.11809
LoMa: Local Feature Matching Revisited
@davnords.bsky.social @parskatt.bsky.social , @bokmangeorg.bsky.social et 6 al
tl;dr: if you train DeDoDe+LightGlue on VGGT-scale data, it helps a LOT.
New IMC2022 sota
arxiv.org/abs/2604.04931
Me too :)
Introducing LoMa, the next generation of feature matcher!
Congratulations! Very nice