Remember how, when the lockdowns started, every organization said "we only have two weeks of cash on hand and will shut down if we don't get assistance"? That's basically happening to every single lab and NGO right now, except for no actual reason.
Posts by Valentin De Bortoli
For the French-speaking audience, S. Mallat's courses at the College de France on Data generation in AI by transport and denoising have just started. I highly recommend them, as I've learned a lot from the overall vision of his courses.
Recordings are also available: www.youtube.com/watch?v=5zFh...
Slides for a general introduction to the use of Optimal Transport methods in learning, with an emphasis on diffusion models, flow matching, training 2 layers neural networks and deep transformers. speakerdeck.com/gpeyre/optim...
😍😍😍
lmbp.uca.fr/stflour/
I'm delighted to note that our paper InDI has been selected as one of two Outstanding Paper awardees by the Transactions on Machine Learning @tmlr-pub.bsky.social
We sincerely thank the expert reviewers, Action Editors, the Outstanding Paper Committee, and the Editors for this honor
1/3
The slides of my NeurIPS lecture "From Diffusion Models to Schrödinger Bridges - Generative Modeling meets Optimal Transport" can be found here
drive.google.com/file/d/1eLa3...
I love a good illustration 😍
After watching this beautiful keynote by @arnauddoucet.bsky.social , I *had* to give these Schrodinger bridges a try! Very interesting to be able to "straighten" a basic flow-matching approach. Super cool work by @vdebortoli.bsky.social & co-author!
Speaking at this #NeurIPS2024 workshop on a new analytic theory of creativity in diffusion models that predicts what new images they will create and explains how these images are constructed as patch mosaics of the training data. Great work by @masonkamb.bsky.social
scienceofdlworkshop.github.io
Don't miss it!
I've been getting a lot of questions about autoregression vs diffusion at #NeurIPS2024 this week! I'm speaking at the adaptive foundation models workshop at 9AM tomorrow (West Hall A), about what happens when we combine modalities and modelling paradigms.
adaptive-foundation-models.org
Fantastic #neurips keynote by Arnaud Doucet! Really like this slide tracing back many of the modern flow-matching / stochastic interpolants ideas to a 1986 result by probabilist Istvan Gyongy describing how to "Markovianize" a diffusion process (eg. having coefficients depending on all the past)
yeah we tried to make it more accessible in arxiv.org/abs/2303.16852 and arxiv.org/abs/2409.09347 but we should definitely work on an easier intro, cc. @jamesthornton.bsky.social 👀
🔥You enjoyed @arnauddoucet.bsky.social talk but want even more Schrodinger Bridge? Come talk to me at our poster!
🔷Schrodinger Bridge Flow for Unpaired Data Translation
🔊 East Exhibit Hall A-C #2504
Work done with my amazing collaborators Ira Korshunova
Andriy Mnih and @arnauddoucet.bsky.social
It's located near the west entrance to the west side of the conference center, on the first floor, in case that helps!
When a bunch of diffusers sit down and talk shop, their flow cannot be matched😎
It's time for the #NeurIPS2024 diffusion circle!
🕒Join us at 3PM on Friday December 13. We'll meet near this thing, and venture out from there and find a good spot to sit. Tell your friends!
100% agree. OT is not (or rarely) a goal in itself but rather a mean to enforce useful properties
Have you ever wondered why diffusion models memorize and all initializations lead to the same training sample? As we show, this is because like in dynamic systems, the memorized sample acts as an attractor and a corresponding attraction basin is formed in the denoising trajectory.
Optimal transport, convolution, and averaging define interpolations between probability distributions. One can find vector fields advecting particles that match these interpolations. They are the Benamou-Brenier, flow-matching, and Dacorogna-Moser fields.
Iterated RF with conservative vector fields should get to OT, though training remains a challenge
arxiv.org/abs/2209.14577
Hellinger and Wasserstein are the two main geodesic distances on probability distributions. While both minimize the same energy, they differ in their interpolation methods: Hellinger focuses on density, whereas Wasserstein emphasizes position displacements.
This is a really nice blogpost by
@RuiqiGao and team that I enjoyed being a part of. My favorite key learnings are:
- DDIM sampler == flow matching sampling
- (Not) straight?
- SD3 weighting (Esser, Rombach, et al) is very similar to the EDM weighting (Karras, et al).
👇
ahah yeah apologies for this, I am slowly learning how to write for non-theoretical proba crowd but it's a process 😅
Yeah I was referring to the coupling obtained after the flow matching operation (or "Reflow"). It's an interesting object in itself which is not exactly OT but still exhibit *some* level of straightness.
New datasets from @polymathicai.bsky.social available on @hf.co will train AI models to think like scientists. Read more: www.simonsfoundation.org/2024/12/02/n... #science #AI #machinelearning
I am a broken record but yeah totally agree. If you iterate FM on that coupling though you get OT though (If you add a bit of noise). In the case of noisy FM we showed that the only coupling that is left invariant by noisy FM is the EOT one in arxiv.org/abs/2311.06978
A common question nowadays: Which is better, diffusion or flow matching? 🤔
Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.
What you are showing is the coupling *before* the flow matching procedure though, right? After the flow matching procedure the coupling is modified (image from arxiv.org/abs/2209.03003)
(Specific to diffusion models) but goes in the direction of what Sander was suggesting: i.e. these models learn a somewhat robust coupling data/Gaussian
What about arxiv.org/abs/2310.05264
Yeah in the sense of RF. Although RF wont get you to OT (Qiang Liu himself has a counterexample). But if you consider noisy flow matching (a la stochastic interpolant) then this procedure converges to EOT. Shameless plug + concurrent paper arxiv.org/abs/2303.16852 + arxiv.org/abs/2304.00917