I think many of them are actually quite good artistically
Posts by Mario
Ruining great art with the nano banana pro command āMake this much more cheerful with as few changes as possibleā
"The Principles of Diffusion Models" by Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon. arxiv.org/abs/2510.21890
It might not be the easiest intro to diffusion models, but this monograph is an amazing deep dive into the math behind them and all the nuances
The main ingredient that led to GRPO's performance leap is the calibration of the reward/value via multiple rollouts per prompt.
Let me elaborate on what I mean by that and a cheaper way of doing it offline.
To understand PPO, GRPO, or any policy-gradient algorithm, you first need to understand REINFORCE. Iāve written my notes here.
New video on the details of diffusion models: youtu.be/iv-5mZ_9CPY
Produced by Welch Labs, this is the first in a short series of 3b1b this summer. I enjoyed providing editorial feedback throughout the last several months, and couldn't be happier with the result.
asking grok 4 for its opinions on israel palestine it first searches to see what Elon musk thinks
Well, a single week was enough to provide a convincing case that a Wikipedia equivalent for LLMs is necessary i.e. decentralized LLM training and serving
Diffusion models have analytical solutions, but they involve sums over the entire training set, and they don't generalise at all. They are mainly useful to help us understand how practical diffusion models generalise.
Nice blog + code by Raymond Fan: rfangit.github.io/blog/2025/op...
Sharing what I like in case you do too. Great book, @natolambert.bsky.social. Itās exactly what I was looking for. rlhfbook.com
Well, the last thing I said can be done today too, so not a great benefit, maybe something else
Yeah, I agree, itās more out of curiosity to see if interpolation would be ā„, where my guess is that it would be at least on par. Like you said, it might be interesting for some interpretability study, e.g., examining how the AdaLN parameters vary with the interpolation coefficient
Interesting that youāre also not seeing a clear reason why this should fail!
Also, thanks for the great links! 2/2
Yes, this is exactly my idea, Fourier emb is implicitly making assumptions about which timesteps matter more, and I suspect that makes it harder for downstream transf (AdaLN etc) to use, compared to simple linear interp, considering how these emb are later used. 1/2
But since the range is fixed, wouldnāt interpolating two e.g. 256d random (or not) vectors work just as well, or even better, since it doesnāt bias any specific timestep (while standard PE does)? I donāt have an intuition for why this would be wrong. 2/2
Thanks @sedielem.bsky.social! I remember this reason for PE. What Iām unsure about is whether it still applies when encoding a timestep as 0-1. In the Flux code itās first mapped to 0ā1000, so I guess your point about needing enough granularity holds. 1/2
Any intuition on using max_period=10k for t sinusoidal PE in, e.g, Flux? Since t enters via AdaLN and range is fixed, would linear interpol. two high dim vec work too? Does PE 10k make it easier to distinguish small t and harder for large t? Maybe @sedielem.bsky.social @stefanabaumann.bsky.social
No, in this country we donāt tolerate senators being forced to the ground and cuffed for asking questions. We donāt tolerate masked agents rounding people up on the streets and disappearing them in unmarked vans. And we sure as hell donāt tolerate
It is critical for scientific integrity that we trust our measure of progress.
The @lmarena.bsky.social has become the go-to evaluation for AI progress.
Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
Black swans are real
Harvard is Dancing Guy. Who is the first follower?
m.youtube.com/watch?v=fW8a...
Workers are not asking to get rich. They just want to afford three meals a day.
In the richest country in the history of the world, no one should work for starvation wages.
It is time to raise the disgraceful $7.25/hr federal minimum wage to a living wage of AT LEAST $17/hr.
Maybe the tradeoff is that adv loss pushes toward the prototype rather than the input, making the image more ārealistic,ā while perc loss keeps it close to the input, with less perceptual distortion. If the input is atypical, the two losses might go in different directions and balance each other
Human heads are out of fashion with the new regime
šØ New preprint!
How far can we go with ImageNet for Text-to-Image generation? w. @arrijitghosh.bsky.social @lucasdegeorge.bsky.social @nicolasdufour.bsky.social @vickykalogeiton.bsky.social
TL;DR: Train a text-to-image model using 1000 less data in 200 GPU hrs!
šhttps://arxiv.org/abs/2502.21318
š§µš
Some sentences are beautiful, some are not (perhaps because they lack context) ā but have we truly read the book? Sometimes I wonder if Iām really savoring beautiful things, or if there are simply too many to pause and fully appreciate the depth of their beauty. 2/2
I think the problem with visual art is that you see it immediately. Then you move on to the next piece, and once again, youāve taken it in within seconds. Itās like opening a book, reading a few sentences here and there, and then closing it. 1/2