Advertisement · 728 × 90

Posts by Chris Offner

Good to know, thank you! 🙏

3 weeks ago 1 0 0 0
Post image

Huge props to Lord and Miller for stepping up and doing what directors like Nolan are too cowardly to do: be up front and give loud and generous credit to their VFX team. Great directors don’t need to lie about how their movies are made; the work speaks for itself.

3 weeks ago 22 3 1 1
Post image

Any idea why Scholar Inbox cannot find the paper arxiv.org/abs/2512.11508? @si-cv-graphics.bsky.social @andreasgeiger.bsky.social

3 weeks ago 2 0 1 0

If you want me to consider reading something, you have to convince me that you care way more than I do.

1 month ago 74 6 1 1

All the "you need to learn AI skills or you'll get left behind" things are patently nonsense. It's easy to use and only becomes easier to use over time. If there's skill it's in knowing what it does well and what is does poorly

2 months ago 307 17 4 11

This is a really good point. While there are men and women on the 'sceptic' side of all these debates, I don't know of any women on the AI 'booster' side. It's really a guy thing

4 months ago 27 2 5 1
Post image

Looking forward to a busy #ICCV2025.

I will give three (very different) talks at workshops and tutorials, see info below.

We also present two papers, ACE-G and SCR Priors.

And it's the 10th (!) anniversary of the R6D workshop, which we co-organize.

6 months ago 12 4 1 1
Video

#TTT3R: 3D Reconstruction as Test-Time Training
TTT3R offers a simple state update rule to enhance length generalization for #CUT3R — No fine-tuning required!
đź”—Page: rover-xingyu.github.io/TTT3R
We rebuilt @taylorswift13’s "22" live at the 2013 Billboard Music Awards - in 3D!

6 months ago 38 4 0 4
A futuristic corridor inside a data center with rows of tall, blue-lit server racks on both sides. Text overlaid at the bottom reads "JUPITER Supercomputer: Europe enters the exascale supercomputing league." In the lower right corner, there is a logo of the European Commission.

A futuristic corridor inside a data center with rows of tall, blue-lit server racks on both sides. Text overlaid at the bottom reads "JUPITER Supercomputer: Europe enters the exascale supercomputing league." In the lower right corner, there is a logo of the European Commission.

🚀 Europe’s first exascale supercomputer is here!

JUPITER, launched in Germany, is the EU’s most powerful system and fourth fastest worldwide.

100% powered by renewables, it has also ranked first in energy efficiency. It will boost AI, science, and climate research.

Read more - europa.eu/!vcWBqW

7 months ago 226 52 11 6
Advertisement

There is a lot to hate about the politics of the silicon valley right, but they do actually want to build stuff, and I would prefer if the left didn't cede "we should be able to build stuff" to the right.

7 months ago 264 16 276 181

People often use "smart" when they mean "wise" and I don't think it's too controversial to doubt the wisdom of some tech elites. Other than that I certainly agree with you.

7 months ago 4 0 1 0
Post image

I can't* fathom why the top picture, and not the bottom picture, is the standard diagram for an autoencoder.

The whole idea of an autoencoder is that you complete a round trip and seek cycle consistency—why lay out the network linearly?

7 months ago 159 25 11 3

I love both.

7 months ago 1 0 0 0
Video

Great video on the convergent evolution from hierarchical military command structures to cybernetics to centralized AI coordination across political ideologies:
www.youtube.com/watch?v=mayo...

7 months ago 2 0 1 0
Gaussian Belief Propagation

I'd also welcome a Bayesian framing. I know Andrew Davison's group has done work on Gaussian belief propagation for SLAM factor graphs (gaussianbp.github.io) but other than that and arxiv.org/abs/1703.04977, I'm not aware of of much Bayesian (deep) learning in (3D) vision right now.

7 months ago 4 0 0 0

In general I think 3D vision would do well to take some inspiration from Bayesians. I guess these days they lost their glamour, but imo it's a very nice way of thinking that feels somewhat lost currently.

7 months ago 2 1 2 0
Video

"It is beautiful. It is elegant. Does it work well in practice? Not really. This is often the caveat we face in research: the things that are beautiful don't work and the things that work are not beautiful." – Daniel Cremers

7 months ago 36 5 2 1

You follow him. Andrew Davison from Imperial College London.

7 months ago 0 0 1 0
Advertisement
Video

"As roboticists and computer vision people [outside of big tech], do we have to just wait for the next foundation model?"

I share the frustration. It's disempowering when most major progress recently is downstream of "foundation models" that you don't have the compute or data to train yourself.

8 months ago 24 2 5 0
Preview
Bibliome - Building the very best reading lists, together Create collaborative bookshelves, discover new books, and build reading communities with friends. Join the decentralized reading revolution powered by Bluesky.

We're live on bluesky! bibliome.club is the platform for creating, collaborating on and sharing reading lists with your Bluesky network - open source and decentralised via ATProto.

8 months ago 248 70 7 21
Preview
a man with a beard and glasses is making a funny face . ALT: a man with a beard and glasses is making a funny face .
8 months ago 3 0 0 0
Post image

Sort of, but DINOv3 also seems to (inadvertently?) point towards the limits of pure scaling.
x.com/chrisoffner3...

8 months ago 3 0 2 0

If you maximize cosine similarity, aren't you left with only a single dimension (i.e. scaling the vector norm) as CosSim-invariant "wiggle room" to encode geometric information that isn't also captured by the language?

8 months ago 0 0 0 0

Yes but that's an additional training objective beyond merely minimizing cosine similarity. You'd need to introduce something that ensures that pixel features don't just collapse to language semantics, via some auxiliary task, no?

8 months ago 0 0 1 0

It just seems to me that mapping pixels and language to highly similar internal representations means that you'll drop a lot of information that is not (or cannot) be accurately described by language.

8 months ago 1 0 3 0
Advertisement

If we try to perfectly reconstruct, e.g., a complex 3D mesh from a natural language description, we'll find that the two modalities operate on very different levels of precision and abstraction.

8 months ago 0 0 0 0

My concern is that language as a modality inherently biases the data towards coarser labels/concepts. You won't perfectly describe per-pixel normals and depth in natural language. Geometry is continuous and "raw", language is discrete and abstract.

8 months ago 2 0 2 0

Oh, interesting. I'll check that out!

8 months ago 0 0 0 0
Post image Post image

Yay, DINOv3 is out!

SigLIP (VLMs) and DINO are two competing paradigms for image encoders.

My intuition is that joint vision-language modeling works great for semantic problems but may be too coarse for geometry problems like SfM or SLAM.

Most animals navigate 3D space perfectly without language.

8 months ago 31 5 1 1

What are the best resources to learn about VLMs? Papers, tutorials, courses, blog posts, whatever is good. I can read the Kimi-VL or GLM tech reports and follow the breadcrumbs but I'd appreciate any and all recommendations towards a useful VLM curriculum! 🙏

8 months ago 8 2 1 0