Sai Kumar Dwivedi (@saidwivedi.in) Bsky

InteractVLM (#CVPR2025) is a great collaboration MPI-IS, UvA and Inria.
Authors: @saidwivedi.in, @anticdimi.bsky.social, S. Tripathi, O. Taheri, C. Schmid, @michael-j-black.bsky.social and @dimtzionas.bsky.social.
Code & models available at: interactvlm.is.tue.mpg.de (10/10)

10 months ago 3 0 0 0

InteractVLM is the first method that infers 3D contacts on both humans and objects from in-the-wild images, and exploits these for 3D reconstruction via an optimization pipeline. In contrast, existing methods like PHOSA rely on handcrafted or heuristic-based contacts. (9/10)

10 months ago 2 0 1 0

With just 5% of DAMON’s 3D body contact annotations, InteractVLM surpasses the fully-supervised DECO baseline trained on 100% of 3D annotations. This is promising for minimizing reliance on costly 3D data by using foundational models. (8/10)

10 months ago 1 0 1 0

InteractVLM also shows strong outperformance on object affordance prediction on the PIAD dataset. Here affordance is defined as contact probabilities on the object. (7/10)

10 months ago 2 0 1 0

InteractVLM significantly outperforms prior work, both qualitatively and quantitatively, on in-the-wild 3D human (binary & semantic) contact prediction on the DAMON dataset. (6/10)

10 months ago 1 0 1 0

To bridge this 2D-to-3D gap, we propose "Render-Localize-Lift":
- Render: 3D human/object meshes into multiview 2D images.
- Localize: A Multiview Localization (MV-Loc) model, guided by VLM tokens, predicts 2D contact masks.
- Lift: 2D contact masks to 3D.
(5/10)

10 months ago 1 1 1 0

How can we infer 3D contact with limited 3D data? InteractVLM exploits foundational models—a VLM & localization model fine tuned to reason about contact. Given an image & prompt, the VLM outputs tokens for localization. But these models work in 2D, while contact is 3D. (4/10)

10 months ago 1 1 1 0

Furthermore, simple binary contact (touching “any” object) misses the rich semantics of real multi-object interactions. Thus, we introduce a novel task - "Semantic Human Contact" estimation: predicting contact points on a human related to a specified object. (3/10)

10 months ago 2 0 1 0

Precisely inferring where humans contact objects from an image is hard due to occlusion & depth ambiguity. Current datasets of images with 3D contact are small as they’re costly & tedious to create (mocap/manual labeling), limiting performance of contact predictors. (2/10)

10 months ago 1 0 1 0

Why does 3D human-object reconstruction fail in the wild or get limited to a few object classes? A key missing piece is accurate 3D contact. InteractVLM (#CVPR2025) uses foundational models to infer contact on humans & objects, improving reconstruction from a single image. (1/10)

10 months ago 5 2 1 0

✨ Happy to be recognised again as an Outstanding Reviewer for #CVPR2025!

11 months ago 2 0 0 0

Thanks to the workshop organizers: @yixinchen.bsky.social, Baoxiong Jia, @yaoyaofeng.bsky.social, @songyoupeng.bsky.social, Chuhang Zou, @saidwivedi.in, Yixin Zhu, Siyuan Huang! 🙌
And the challenge organizers: Xiongkun Linghu, Tai Wang, Jingli Lin, Xiaojian Ma

1 year ago 1 0 0 0

📢 Excited to announce the 5th Workshop on 3D Scene Understanding for Vision, Graphics & Robotics at #CVPR2025! We’ll dive into multimodal 3D scene understanding & reasoning with amazing speakers and challenges.
@cvprconference.bsky.social
More Details: scene-understanding.com.

1 year ago 3 2 1 0

I've been using GitHub's Lists feature for over a year, and it's seriously underrated! ⭐

It lets you assign labels to all your starred repos, making it super easy to find projects later based on specific fields or topics. No more endless scrolling!

Link to my list: github.com/saidwivedi?t...

1 year ago 3 0 0 0

Vacancy — PhD Positions, Project 'Spatiotemporal Reconstruction of Interacting People for Perceiving Systems' Do you want to help computers see, understand, and assist us, humans, in everyday life? Are you excited with 3D Machine Perception, 3D Human and Object Understanding, 3D Human Avatars, and Machine Le...

📢 I am #hiring 2x #PhD candidates to work on Human-centric #3D #ComputerVision at the University of #Amsterdam!

The positions are funded by an #ERC #StartingGrant.

For details and for submitting your application please see:
werkenbij.uva.nl/en/vacancies...

🆘 Deadline: Feb 16 🆘

1 year ago 15 6 1 2

Thanks for sharing :) @chrisoffner3d.bsky.social can you also please add me to the list? I work on 3D human avatar.

1 year ago 3 0 0 0

[M2L 2024] Transformers - Lucas Beyer YouTube video by Mediterranean Machine Learning (M2L) summer school

One of the best tutorials for understanding Transformers!

📽️ Watch here: www.youtube.com/watch?v=bMXq...

Big thanks to @giffmana.ai for this excellent content! 🙌

1 year ago 54 8 0 0

Would love to be in the list 😃

1 year ago 1 0 1 0

Writing a good scientific paper

For those who missed this post on the-network-that-is-not-to-be-named, I made public my "secrets" for writing a good CVPR paper (or any scientific paper). I've compiled these tips of many years. It's long but hopefully it helps people write better papers. perceiving-systems.blog/en/post/writ...

1 year ago 260 65 4 8

Posts by Sai Kumar Dwivedi