This work was led by our amazing intern Oindrila Saha (UMass Amherst -- now at Adobe!), with Vojtech Krs, Radomir Mech, Kevin Blackburn-Matzen, and Subhransu Maji (UMass Amherst).
#ICLR2026 #ImageGeneration #ComputerVision
Posts by Matheus Gadelha
To train and evaluate at this scale, we introduce SIGMA-Set27K: a synthetic dataset with 100k+ unique subjects across 27k images, providing identity, structure, and spatial annotations. Training data at this level of annotation density did not previously exist for this task.
The framework also supports single- and multi-subject insertion in one pass, subject reposing, free-form masks not seen during training, and mixing different granularity levels in a single generation.
Subject identity is preserved while style varies freely via text. The same objects can be re-lit, re-stylized, or reposed without losing their appearance.
SIGMA-Gen also accepts spatial guidance at varying levels of precision — coarse 2D bounding boxes, 3D boxes, or pixel-level segmentation masks and depth maps — with a single model. You provide as much structure as you have; the model fills the rest.
The key challenge: placing multiple specific subjects in one scene while preserving each identity. Prior methods handle subjects independently or lose fidelity at compositing time. SIGMA-Gen does this in a single forward pass.
Project page: oindrilasaha.github.io/SIGMA-Gen/
Text-to-image models are remarkable at generation, but they end up deciding on their own what goes where. If you need specific subjects in a specific arrangement, you are left writing prompts and sampling until something usable appears. SIGMA-Gen, our new ICLR 2026 paper, tries to change that.
I am very likely guilty of this myself, but I agree and really appreciate your take!
PSA
(this sounds like something pretty important that I should know but could have lived the rest of my life in shame without ever learning it)
🚀 Excited to share REPPO, a new on-policy RL agent!
TL;DR: Replace PPO with REPPO for fewer hyperparameter headaches and more robust training.
REPPO, led by @cvoelcker.bsky.social, will be presented at ICLR 2026. How does it work? 🧵👇
Great resource!
This is specially important for people starting their research careers.
Getting to know different research communities is really important! Technical insight is crucial, but science is ultimately a social endeavor -- understanding and communicating to your peers is key.
As we in the US gather with our families this weekend, it's a good time to consider the ways that technology brings us together—but also keeps us apart. In this blog post, I tell a story about how technology may have tended to increase social isolation.
aaronhertzmann.com/2025/10/26/i...
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range. pleias.fr/blog/blogsyn...
Session this afternoon (in 30 minutes)!!
Poster 153 — see you there!
I wrote a notebook for a lecture/exercice on image generation with flow matching. The idea is to use FM to render images composed of simple shapes using their attributes (type, size, color, etc). Not super useful but fun and easy to train!
colab.research.google.com/drive/16GJyb...
Comments welcome!
Oh nvm I read “our ICCV paper…” haha
Are the results out? I see nothing in OpenReview :-(
For folks attending CVPR: is there a website where I can see the list of workshops, their location AND time? Day and time are empty when I access cvpr.thecvf.com/Conferences/...
I will be in Nashville until Saturday for CVPR'25 \o/
DM if you want to meet!
Wilhem receiving the award on stage
🏅Honored to have been awarded at #Eurographics25 for our paper on #LipschitzPruning to speed-up SDF rendering!
👉 The paper's page: wbrbr.org/publications...
Congrats to @wbrbr.bsky.social, M. Sanchez, @axelparis.bsky.social, T. Lambert, @tamyboubekeur.bsky.social, M. Paulin and T. Thonat!
I usually write a python script that prints some .npy file as tex a .tex table. It is also useful as an easy way to share results throughout the project, so I consider this as part of the codebase. I heard that people that are more serious about such practice use smth like github.com/JelteF/PyLaTeX
Comecei a fazer por causa disso e por causa do vim, mas até no overleaf é vantagem
Eu n uso pra coisas que faço sozinho (anotações, apresentações, etc). Mas pra trabalho colaborativo é meio q obrigatório. Estudantes vão entrar em revolta se vc usar git hahaha
NeurIPS and SIGGRAPH Asia deadline are coming.
Make your life easier: read this thread.
Let's gooo!!! \o/
Probably my first time visiting Brazil for professional reasons :-)
What features did you find particularly useful?
I liked asking questions about the code base and the tab completion seems nice, but I've been getting unhelpful suggestions for all the "agentic" stuff.
By popular demand, we are extending #CVPR2025 coverage to Bluesky. Stay tuned!
Exciting news! MegaSAM code is out🔥 & the updated Shape of Motion results with MegaSAM are really impressive! A year ago I didn't think we could make any progress on these videos: shape-of-motion.github.io/results.html
Huge congrats to everyone involved and the community 🎉
*it
I understand the sentiment, but it is important for people to know that is currently does not reflect reviewer guidelines at CVPR: cvpr.thecvf.com/Conferences/...
“(…) you should include specific feedback on ways the authors can improve their papers.”