Advertisement ยท 728 ร— 90

Posts by Dominik Schnaus

Preview
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

New paper: Back into Platoโ€™s Cave

Are vision and language models converging to the same representation of reality? The Platonic Representation Hypothesis says yes. BUT we find the evidence for this is more fragile than it looks.

Project page: akoepke.github.io/cave_umwelten/

1/9

4 days ago 55 15 2 4
Post image

๐— ๐—–๐— ๐—Ÿ ๐—•๐—น๐—ผ๐—ด: Images and text are usually aligned using millions of imageโ€“caption pairs. But could they still be matched if they were never seen together?

In โ€œItโ€™s a (Blind) Match!โ€, MCML Members explore this question.
mcml.ai/news/2026-01...

3 months ago 2 1 0 0
Video

๐Ÿฆ– We present โ€œFeed-Forward SceneDINO for Unsupervised Semantic Scene Completionโ€. #ICCV2025
๐ŸŒ: visinf.github.io/scenedino/
๐Ÿ“ƒ: arxiv.org/abs/2507.06230
๐Ÿค—: huggingface.co/spaces/jev-a...
@jev-aleks.bsky.social @fwimbauer.bsky.social @olvrhhn.bsky.social @stefanroth.bsky.social @dcremers.bsky.social

9 months ago 24 10 1 1
Post image

The code for our #CVPR2025 paper, PRaDA: Projective Radial Distortion Averaging, is now out!

Turns out distortion calibration from multiview 2D correspondences can be fully decoupled from 3D reconstruction, greatly simplifying the problem

arxiv.org/abs/2504.16499
github.com/DaniilSinits...

9 months ago 12 5 1 0

4/4

๐ˆ๐ญโ€™๐ฌ ๐š (๐๐ฅ๐ข๐ง๐) ๐Œ๐š๐ญ๐œ๐ก! ๐“๐จ๐ฐ๐š๐ซ๐๐ฌ ๐•๐ข๐ฌ๐ข๐จ๐งโ€“๐‹๐š๐ง๐ ๐ฎ๐š๐ ๐ž ๐‚๐จ๐ซ๐ซ๐ž๐ฌ๐ฉ๐จ๐ง๐๐ž๐ง๐œ๐ž ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ ๐ƒ๐š๐ญ๐š

@schnaus.bsky.social @neekans.bsky.social @dcremers.bsky.social

๐Ÿ“ย Paper: arxiv.org/pdf/2503.241...
๐ŸŒย Project page: dominik-schnaus.github.io/itsamatch/
๐Ÿ’ปย Code: github.com/dominik-schn...

10 months ago 0 0 0 0

3/4

โœ…ย This enables unsupervised matching โ€” finding vision-language correspondences without any paired data.

๐Ÿคฏย As a proof of concept, we build an unsupervised image classifier that assigns labels without seeing a single image-text pair.

10 months ago 0 0 1 0

2/4

๐Ÿ”ย As models and datasets scale, distances in vision and language embeddings become similar (Platonic Representation Hypothesis).

๐Ÿ’กย We cast the matching task as a Quadratic Assignment Problem (QAP) and propose a new heuristic solver.

10 months ago 0 0 1 0
Video

Can we match vision and language representations without any supervision or paired data?

Surprisingly, yes!ย 

Our #CVPR2025 paper with @neekans.bsky.social and @dcremers.bsky.social shows that the pairwise distances in both modalities are often enough to find correspondences.

โฌ‡๏ธ 1/4

10 months ago 27 12 1 0
Advertisement
Video

Can you train a model for pose estimation directly on casual videos without supervision?

Turns out you can!

In our #CVPR2025 paper AnyCam, we directly train on YouTube videos and achieve SOTA results by using an uncertainty-based flow loss and monocular priors!

โฌ‡๏ธ

11 months ago 25 10 1 1
Video

Check out our latest recent #CVPR2025 paper AnyCam, a fast method for pose estimation in casual videos!

1๏ธโƒฃ Can be directly trained on casual videos without the need for 3D annotation.
2๏ธโƒฃ Based around a feed-forward transformer and light-weight refinement.

Code and more info: โฉ fwmb.github.io/anycam/

11 months ago 23 6 1 0
Post image

We are thrilled to have 12 papers accepted to #CVPR2025. Thanks to all our students and collaborators for this great achievement!
For more details check out cvg.cit.tum.de

1 year ago 36 12 1 2

Indeed - everyone had a blast - thank you all for the great talks, discussions and Ski/snowboarding!

1 year ago 45 4 1 3