Semisance (@semisance) Bsky

Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction
arxiv.org/abs/2503.16318

1 year ago 0 0 0 0

Structured-Noise Masked Modeling for Video, Audio and Beyond
arxiv.org/abs/2503.16311

1 year ago 0 0 0 0

Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation
arxiv.org/abs/2503.15905

1 year ago 0 0 0 0

SynCity: Training-Free Generation of 3D Worlds
arxiv.org/abs/2503.16420

Project page: research.paulengstler.com/syncity/

1 year ago 0 0 0 0

SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
arxiv.org/abs/2503.16396

1 year ago 0 0 0 0

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
arxiv.org/abs/2503.16430

Project page: yuqingwang1029.github.io/TokenBridge/

1 year ago 0 0 0 0

M3: 3D-Spatial MultiModal Memory
arxiv.org/abs/2503.16413

Project page: m3-spatial-memory.github.io

#ICLR2025 🎉

1 year ago 1 0 0 0

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
arxiv.org/abs/2503.14830

Project page: dp-recon.github.io

#CVPR2025 🎉

1 year ago 0 0 0 0

Cube: A Roblox View of 3D Intelligence
arxiv.org/abs/2503.15475

Project page: github.com/Roblox/cube

1 year ago 0 0 0 0

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
arxiv.org/abs/2503.14858

Project page: wang-kevin3290.github.io/scaling-crl/

1 year ago 0 0 0 0

Temporal Regularization Makes Your Video Generator Stronger
arxiv.org/abs/2503.15417

Project page: haroldchen19.github.io/FluxFlow/

1 year ago 0 0 0 0

Visual Persona: Foundation Model for Full-Body Human Customization
arxiv.org/abs/2503.15406

Project page: cvlab-kaist.github.io/Visual-Perso...

#CVPR2025 🎉

1 year ago 0 0 0 0

Object-Centric Pretraining via Target Encoder Bootstrapping
arxiv.org/abs/2503.15141

#ICLR2025 🎉

Code repository: github.com/djukicn/ocebo (coming soon)

1 year ago 0 0 0 0

TULIP: Towards Unified Language-Image Pretraining
arxiv.org/abs/2503.15485

Project page: tulip-berkeley.github.io

1 year ago 0 0 0 0

When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning
arxiv.org/abs/2503.15096

Project page: github.com/yafeng19/T-C...

#CVPR2025 🎉

1 year ago 2 0 0 0

Humanoid Policy ~ Human Policy
arxiv.org/abs/2503.13441

Project page: human-as-robot.github.io

1 year ago 0 0 0 0

Stable Virtual Camera: Generative View Synthesis with Diffusion Models
arxiv.org/abs/2503.14489

Project page stable-virtual-camera.github.io

1 year ago 0 0 0 0

Impossible Videos
arxiv.org/abs/2503.14378

Project page: showlab.github.io/Impossible-V...

1 year ago 0 0 0 0

Bolt3D: Generating 3D Scenes in Seconds
arxiv.org/abs/2503.14445

Project page: szymanowiczs.github.io/bolt3d

1 year ago 0 0 0 0

MusicInfuser: Making Video Diffusion Listen and Dance
arxiv.org/abs/2503.14505

Project page: susunghong.github.io/MusicInfuser/

1 year ago 0 0 0 0

Deeply Supervised Flow-Based Generative Models
arxiv.org/abs/2503.14494

Project page: deepflow-project.github.io

1 year ago 0 0 0 0

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
arxiv.org/abs/2503.14487

Project page: shiml20.github.io/DiffMoE/

1 year ago 0 0 0 0

Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
arxiv.org/abs/2503.13587

Project page: github.com/dk-liang/Uni...

1 year ago 0 0 0 0

DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers
arxiv.org/abs/2503.14405

Project page: europe.naverlabs.com/research/pub...

#CVPR2025 🎉

1 year ago 0 0 0 0

LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models
arxiv.org/abs/2503.14325

Project page: github.com/westlake-rep...

1 year ago 0 0 0 0

Less is More: Improving Motion Diffusion Models with Sparse Keyframes
arxiv.org/abs/2503.13859

1 year ago 0 0 0 0

DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
arxiv.org/abs/2503.14324

1 year ago 0 0 0 0

Fast Autoregressive Video Generation with Diagonal Decoding
arxiv.org/abs/2503.14070

1 year ago 0 0 0 0

Make Your Training Flexible: Towards Deployment-Efficient Video Models
arxiv.org/abs/2503.14237

Code repository: github.com/OpenGVLab/Fl...

1 year ago 0 0 0 0

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
arxiv.org/abs/2503.12649

1 year ago 0 0 0 0

Posts by Semisance