Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction
arxiv.org/abs/2503.16318
Posts by Semisance
Structured-Noise Masked Modeling for Video, Audio and Beyond
arxiv.org/abs/2503.16311
Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation
arxiv.org/abs/2503.15905
SynCity: Training-Free Generation of 3D Worlds
arxiv.org/abs/2503.16420
Project page: research.paulengstler.com/syncity/
SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
arxiv.org/abs/2503.16396
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
arxiv.org/abs/2503.16430
Project page: yuqingwang1029.github.io/TokenBridge/
M3: 3D-Spatial MultiModal Memory
arxiv.org/abs/2503.16413
Project page: m3-spatial-memory.github.io
#ICLR2025 ๐
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
arxiv.org/abs/2503.14830
Project page: dp-recon.github.io
#CVPR2025 ๐
Cube: A Roblox View of 3D Intelligence
arxiv.org/abs/2503.15475
Project page: github.com/Roblox/cube
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
arxiv.org/abs/2503.14858
Project page: wang-kevin3290.github.io/scaling-crl/
Temporal Regularization Makes Your Video Generator Stronger
arxiv.org/abs/2503.15417
Project page: haroldchen19.github.io/FluxFlow/
Visual Persona: Foundation Model for Full-Body Human Customization
arxiv.org/abs/2503.15406
Project page: cvlab-kaist.github.io/Visual-Perso...
#CVPR2025 ๐
Object-Centric Pretraining via Target Encoder Bootstrapping
arxiv.org/abs/2503.15141
#ICLR2025 ๐
Code repository: github.com/djukicn/ocebo (coming soon)
TULIP: Towards Unified Language-Image Pretraining
arxiv.org/abs/2503.15485
Project page: tulip-berkeley.github.io
When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning
arxiv.org/abs/2503.15096
Project page: github.com/yafeng19/T-C...
#CVPR2025 ๐
Humanoid Policy ~ Human Policy
arxiv.org/abs/2503.13441
Project page: human-as-robot.github.io
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
arxiv.org/abs/2503.14489
Project page stable-virtual-camera.github.io
Impossible Videos
arxiv.org/abs/2503.14378
Project page: showlab.github.io/Impossible-V...
Bolt3D: Generating 3D Scenes in Seconds
arxiv.org/abs/2503.14445
Project page: szymanowiczs.github.io/bolt3d
MusicInfuser: Making Video Diffusion Listen and Dance
arxiv.org/abs/2503.14505
Project page: susunghong.github.io/MusicInfuser/
Deeply Supervised Flow-Based Generative Models
arxiv.org/abs/2503.14494
Project page: deepflow-project.github.io
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
arxiv.org/abs/2503.14487
Project page: shiml20.github.io/DiffMoE/
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
arxiv.org/abs/2503.13587
Project page: github.com/dk-liang/Uni...
DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers
arxiv.org/abs/2503.14405
Project page: europe.naverlabs.com/research/pub...
#CVPR2025 ๐
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models
arxiv.org/abs/2503.14325
Project page: github.com/westlake-rep...
Less is More: Improving Motion Diffusion Models with Sparse Keyframes
arxiv.org/abs/2503.13859
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
arxiv.org/abs/2503.14324
Fast Autoregressive Video Generation with Diagonal Decoding
arxiv.org/abs/2503.14070
Make Your Training Flexible: Towards Deployment-Efficient Video Models
arxiv.org/abs/2503.14237
Code repository: github.com/OpenGVLab/Fl...
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
arxiv.org/abs/2503.12649