π€Can Generative Video Models Help Pose Estimation?
β
Yes!
We find that generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little to no overlap.
π inter-pose.github.io
Posts by Ethan Weber
For Toon3D (toon3d.studio), we made a labeler to make it easy to label points across many images. It visualizes depth and SAM masks too. The tool is online at labeler.toon3d.studio with docs at github.com/ethanweber/t.... Please reach out if you have any questions! π
We ran DUSt3R on our cartoon reconstruction setting and found that it struggles (even with ground truth correspondences)! Our proposed work is designed to handle this challenging setting where input images are hand-drawn and non-geometrically consistent. toon3d.studio
Check out CAT4D: our new paper that turns (text, sparse images, videos) => (dynamic 3D scenes)!
I can't get over how cool the interactive demo is.
Try it out for yourself on the project page: cat-4d.github.io
We've released our paper "Generating 3D-Consistent Videos from Unposed Internet Photos"! Video models like Luma generate pretty videos, but sometimes struggle with 3D consistency. We can do better by scaling them with 3D-aware objectives. 1/N
page: genechou.com/kfcw