(5) The success of scaling text, images, video should be an argument *for* scaling, not *against* other modalities.
(6) Efficiency matters. Hoping models become as efficient as existing alternatives without exploring to improve those alternatives is blindfolding us.
6/6
Posts by Thiemo Alldieck
(4) As pointed out already by Aleks Holynski on Twitter, text tokens, and pixels are equally handcrafted. If we accept those as valid, singling out 3D as "too handcrafted" is logically inconsistent.
5/6
(3) We humans build spatial memory through physical interaction. I don't see how models can develop true spatial understanding without building a spatial memory themselves. 3D representations seem way more helpful here than observing 2D pixel streams.
4/6
(2) 3D is more than its representation. While specific data structures will evolve or disappear, 3D is the fundamental concept our world is grounded in. It will always be worth studying, even if models learn it implicitly (which we currently just hope for).
3/6
(1) Computer vision was developed to solve "real" problems like measuring, quality control, medical imaging, or mapping. These aren't just "fake tasks" waiting for an embodied agent.
2/6
Great read! Here are my 2 cents: I agree with the push toward end-to-end learning, however, the conclusion that CV will simply "go away" feels too dramatic and overly simplified. Here is what I believe was overlooked: π§΅
(cross posting from Twitter)
1/6
Project page
*links to*
Huggingface paper page
*links to*
arXiv abstract
*links to*
PDF
π« π« π«
We are looking for Student Researchers to work with us in Zurichπ¨π next year!
If you work on depth and/or 3D reconstruction, please reach out!
Europe-based position:
www.google.com/about/career...
US-based position:
www.google.com/about/career...
Find me today at 4:30pm at the Google booth - let's chat! #CVPR2025
On my way to #CVPR2025 π«
Looking forward to connect!
If you expect a service (paper published), pay a price (review others). Isn't it that simple?
Excited to share that today our paper recommender platform www.scholar-inbox.com has reached 20k users! We hope to reach 100k by the end of the year.. Lots of new features are being worked on currently and rolled out soon.
My group is looking for motivated PhD students that want to work on the future of digital humans.
Within the ERC project 'LeMo: Learning Digital Humans in Motion' there are two open positions:
www.career.tu-darmstadt.de/HPv3.Jobs/TU...
www.career.tu-darmstadt.de/HPv3.Jobs/TU...
hey everyone - I am now also active here and excited about computer vision and machine learning stuff. π
Scroll Reverser is another one...
Come and work with us πͺ
βοΈ