🌺 Just 4 days to go!
Join us in Honolulu for the Instance-Level Recognition and Generation Workshop at #ICCV2025 🏝
🗓️ Oct 19, 8:30am–12:30pm 📍 Room 306 A
We’ll have amazing keynotes, plus oral and poster sessions featuring accepted and invited papers.
Don’t miss it!
ilr-workshop.github.io/ICCVW2025/
Posts by Mark Boss
Few examples from the demo. You can generate various styles from a single prompt. You can either pick semantic matching images, or not and get unexpected results.
[1] unsplash.com/photos/a-boa...
[2] unsplash.com/photos/mount...
[3] unsplash.com/photos/man-i...
[4] unsplash.com/photos/macro...
Thanks to my co-authors Andreas Engelhardt, Simon Donné, Varun Jampani
Also check out the HF demo huggingface.co/spaces/stabi..., the code github.com/Stability-AI..., and the explainer youtu.be/ckcSgf0s-jI
This can be used for multiple applications such as color matching or diffusion guidance. Here, we showcase the diffusion process of generating a medieval house with the reference to the right.
Variance in MC is quite common in computer graphics so we combined ReSTIR -- more precisely the weighted reservoir sampling -- with SWD to keep more impactful random directions in the optimization.
Happy to announce: ReSWD. Sliced Wasserstein Distances are quite powerful, but they perform a Monte Carlo (MC) integration (over random directions). During an optimization this can lead to noisy gradients due to variance.
Project page: reservoirswd.github.io
There was SV3D where we mostly discarded any SDS loss (only for unseen areas). I was mainly working on the 3D part and it required quite a few tricks to make it work. sv3d.github.io
3️⃣ MARBLE: Edit materials effortlessly using simple CLIP feature manipulation, supporting exemplar-based interpolation or parametric edits across various styles. Check it out: marblecontrol.github.io
2️⃣ SPAR3D (follow-up to SF3D): Integrates a fast point diffusion module, enhancing depth, backside modeling, and enabling easier editing. Project page: spar3d.github.io
1️⃣ SF3D: Generate textured, UV-unwrapped 3D assets with additional materials incredibly fast (<0.3s)! More details here: stable-fast-3d.github.io
I’m at CVPR this week! Looking forward to connecting and discussing all things graphics, 3D, and gen AI. I'll be presenting 3 papers—stop by and chat!
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Jensen (Jinghao)Zhou, Hang Gao, Vikram Voleti, @adyaman.bsky.social, Chun-Han Yao, @markboss.bsky.social, @philiptorr.bsky.social, Christian Rupprecht, Varun Jampani
arxiv.org/abs/2503.14489
Check out the HF demo to test the model: huggingface.co/spaces/stabi.... The model (huggingface.co/stabilityai/...) is also available with code and Comfy Nodes github.com/Stability-AI.... We also have a project page available at spar3d.github.io
An image showcasing editing of the point cloud representation to add a cup to a mug or a tail to a plush toy.
One neat implication is that we can edit the point cloud to fix missing features or wrong scaling. We even created a small gradio component for simple edits in the demo (pypi.org/project/grad...)
Happy to announce SPAR3D! A fast <1s single image to 3D reconstruction model that combines the best from diffusion and regression models by leveraging a point diffusion module to perform a fast initial point cloud. This aids 3D understanding for the mesh estimation
stability.ai/news/stable-...
A single procedural modeling system is a huge undertaking when you aim for a high quality level. Take speed tree for example which combines procedural aspects with hand authored elements and it’s an entire company dedicated to that.
Yes I agree for certain things it can work. Simple cities (Manhattan style) and natural landscapes are rather well fitting and are explored heavily in video games already. Going for interiors or any object is another beast.
The realistic rendering is not the problem and even full path tracing scenes is doable for room scale scenes on GPU. It still requires some denoising tho as otherwise rendering times are too long to generate any meaningful amount of data. But even then data is the bottleneck
It’s hard to scale 3D data similarly to image or video. We run around with capable cameras all the time. Only few people can model 3D and it’s takes time and isn’t offered for free (rightfully). So even if we would pay all artists in the world, we still won’t hit the scale of image and video.
I recently went with recreating the rooms in Blender. A lot of furniture websites now have 3D viewers and you can download the models from devtools. They are also metric sized. Then blender becomes sims pro and you can iterate quite fast.
Would love to be added too ;)
Looking at ICLR submissions with the lowest score - What a work of art! 🧵
I used 📍🔗 emojis to maximize Twitter/Bluesky parity in my profile. This is definitely pointless, but it's fun.
bsky.app/profile/cspr... markboss.me/publication/... :D
I wasn’t aware that ads are not that bad as long as they are of good quality and diverse. Now I know.
Hi Kosta :). Can you also add me?
I had this account lying around for quite some time. It seems 🦋 is starting to take off. It's great to see many scientists here - and no weird gadget ads in between 😅