significantly outperforms generic visual pretraining (e.g., DINO-style features) in terms of generalization.
๐https://simongiebenhain.github.io/Pix2NPHM
๐ฅhttps://youtu.be/MgpEJC5p1Ts
Great work by Simon Giebenhain, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Zhe Chen.
Posts by Matthias Niessner
(1) large-scale registration of existing 3D head datasets, and
(2) self-supervised training on vast in-the-wild 2D video datasets using pseudo ground-truth surface normals.
Finally, we show that geometry-aware pretraining on pixel-aligned reconstruction tasks
Pix2NPHM obtains fast and reliable NPHM reconstructions on real-world data. Inference-time optimization against surface normals and canonical point maps can further increase fidelity.
Key to successful and generalized training of our ViT-based network are:
Face tracking & 3D reconstruction are often limited by the representational capacity of PCA-based face models. By lifting NPHMs to a first-class reconstruction primitive, we enable more accurate geometry, richer expressions, and finer animation control.
๐ขPix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image๐ข
We directly regress neural parametric head models (NPHMs) from a single image โ fast, stable, and significantly more expressive than classical 3DMMs such as FLAME.
๐https://peter-kocsis.github.io/IntrinsicImageFusion
๐ฅhttps://youtu.be/-Vs3tR1Xl7k
Great work by Peter Kocsis and Lukas Hollein!
3) optimize low-dimensional parameters for physically-grounded reconstructions.
The results are relightable PBR textures for 3D scenes: check out the result on a real-world 3D scan from the ScanNet++ dataset!
๐ข Intrinsic Image Fusion for Multi-View 3D Material Reconstruction ๐ข
We combine generative material priors with inverse path tracing: 1) define a parametric texture space 2) fuse monocular predictions across views into consistent textures
Today in our TUM AI - Lecture Series we'll have the amazing Ruiqi Gao, Google DeepMind.
She'll talk about "๐๐ฎ๐ข๐ฅ๐๐ข๐ง๐ ๐ ๐๐ง๐๐ซ๐๐ญ๐ข๐ฏ๐ ๐ฐ๐จ๐ซ๐ฅ๐ ๐ฆ๐จ๐๐๐ฅ๐ฌ: progress and challenges".
Live stream: www.youtube.com/live/CkOSMqw...
7pm GMT+1 / 10am PST (Tue Dec 16th).
We also provide an interactive GUI to enable the exploration of our editing pipeline.
๐ antoniooroz.github.io/PercHead/
๐ฝ๏ธ youtu.be/4hFybgTk4kE
Great work by Antonio Oroz and and Tobias Kirschstein
by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt.
Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output.
In addition, our architecture seamlessly adapts to downstream tasks:
At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS.
๐ข๐ข ๐๐๐ซ๐๐๐๐๐: ๐๐๐ซ๐๐๐ฉ๐ญ๐ฎ๐๐ฅ ๐๐๐๐ ๐๐จ๐๐๐ฅ ๐๐จ๐ซ ๐๐ข๐ง๐ ๐ฅ๐-๐๐ฆ๐๐ ๐ ๐๐ ๐๐๐๐ ๐๐๐๐จ๐ง๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ข๐จ๐ง & ๐๐๐ข๐ญ๐ข๐ง๐ ๐ข๐ข
PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text.
#ICCV last week was incredible โ catching up with so many people, chatting about research, and, most importantly, having lots of fun.
Still hard to fathom this privilege as a researcher โ getting to travel to such amazing places and be part of this brilliant community - Thanks!
The hot topic at #ICCV2025 was World Models.
They come in different flavors โ (interactive) video models, neural simulators, reconstruction models, etc. โ but the overarching goal is clear: Generative AI that predict and simulate how the real world works.
Hawaii on the same scale as the United Kingdom.
๐บ๐๐๐๐๐๐ก๐ ๐๐๐๐ ๐ ๐ข๐ โ I generate, therefore I am.
for more documentation: github.com/scannetpp/sc...
Huge thanks to Yueh-Cheng Liu, as well as Chandan Yeshwanth and @niessner.bsky.social for their incredible work!
On the bright side, tooling for training has dramatically improved since then. Deep learning frameworks (PyTorch et. al) and scheduling systems such as SLURM or Kubernetes have become the backbone of modern AI.
Given the humongous compute demands of recent generative frontier AI models -- LLMs, image, and video models, etc. --, where compute is measured in Gigawatts, these challenges seem quite amusing.
The required compute was typically a couple of GPUs on a single desktop machine, trained over several days; e.g., AlexNet was trained on two GTX 580 3GB GPUs for 5-6 days.
In the 'early days' of modern deep learning (2012-2015) when ConvNets such as AlexNet or VGG came out, it was considered almost impractical to train an ImageNet classifier from scratch.
Fantastic retreat this weekend by our research groups!
Internal reviews, ideas brainstorming, paper reading, and much more! Of course also many social activities -- the highlight being our kayaking trip - lots of fun :)
All six of our submissions were accepted to #NeurIPS2025 ๐๐ฅณ
Awesome works about Gaussian Splatting Primitives, Lighting Estimation, Texturing, and much more GenAI :)
Great work by Peter Kocsis, Yujin Chen, Zhening Huang, Jiapeng Tang, Nicolas von Lรผtzow, Jonathan Schmidt ๐ฅ๐ฅ๐ฅ
We generate multiple videos along short, pre-defined trajectories that explore the scene in depth. Our scene memory conditions each video on the most relevant prior views while avoiding collisions.
Great work by Manuel Schneider & @LukasHollein
Can we use video diffusion to generate 3D scenes?
๐๐จ๐ซ๐ฅ๐๐๐ฑ๐ฉ๐ฅ๐จ๐ซ๐๐ซ (#SIGGRAPHAsia25) creates fully-navigable scenes via autoregressive video generation.
Text input -> 3DGS scene output & interactive rendering!
๐http://mschneider456.github.io/world-explorer/
๐ฝ๏ธhttps://youtu.be/N6NJsNyiv6I
We further propose a color-based densification and progressive training scheme for improved quality and faster convergence.
shivangi-aneja.github.io/projects/sca...
youtu.be/VyWkgsGdbkk
Great work by Shivangi Aneja, Sebastian Weiss, Irene Baeza Rojo, Prashanth Chandran, Gaspard Zoss, Derek Bradley
We operate on patch-based local expression features and increase the representation capacity by synthesizing 3D Gaussians dynamically by leveraging tiny scaffold MLPs conditioned on localized expressions.
ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions (#SIGGRAPH)
We reconstruct ultra-high fidelity photorealistic 3D avatars capable of generating realistic and high-quality animations including freckles and other fine facial details.
shivangi-aneja.github.io/projects/sca...