Advertisement · 728 × 90

Posts by scoff manifesto

Post image

lol

1 month ago 1 0 1 0

yeah. it's the same as lifting your nonlinear dynamics into higher dimensions to linearize. you cross your fingers and hope that your eigenbasis isn't continuous so some discrete, fiinite representation is good enough

for llms though, you explicitly break markov assumptions during postraining

1 month ago 1 1 0 0

as in the result of the paper (scaling your encoder/decoder gets you ~nothing)? that seems entirely implausible to me! or at least it shouldn’t be true when you’re simulating actual physics instead of human recognizable video

1 month ago 0 0 1 0

a big issue here is (imo) the models aren’t well calibrated so i’m not entirely sold on even fischer as a metric that means much of anything if you use any sort of real model

1 month ago 0 0 0 0
Preview
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Visual tokenization via auto-encoding empowers state-of-the-art image and video generative models by compressing pixels into a latent space. Although scaling Transformer-based generators has been cent...

yes (and it usually doesn't replicate lol). i am currently in complete disbelief of this (empirical!) result. i straight up do not believe this is possible.

arxiv.org/abs/2501.09755

1 month ago 3 0 1 0

is that super surprising? at low precision all these kl methods go crazy unless you're very very careful.

1 month ago 3 0 1 0

this metric is going to perfectly match the hessian of the parameters with respect to your loss function if the model is perfectly calibrated (you don't have the exact perfect distribution, but you have the first and second moments of the score)

1 month ago 1 0 1 0
Advertisement

so you instead look at the tangent bundles on each point of your manifold (the reimannian). the fischer metric is one where you do all this using the KL notion of distance, and it turns out doing this for your tangent bundle is unique up to a scaling.

1 month ago 3 0 2 0

so you have some family distributions θ that live on your manifold Θ. you want to equip this manifold with some sort of inner product so you can do all the things you like to do with when you have an inner product. you also want to make this not rely on your choice of parameterization.

1 month ago 2 0 1 0

any specific goal with this analysis in mind? ive become sort of blackpilled on all these metrics i think they are only narrowly useful

1 month ago 0 0 1 0

the fischer of the weights tells you how much information they carry. also tells you what perturbations your model is most sensitive to in a KL sense. there are many metrics you could chose but fischer is the ~unique reimannian metric. for a perfectly calibrated model it is exactly the hessian

1 month ago 3 0 1 0

you can check this yourself on gpt-oss 20b with High reasoning in the prompt vs low reasoning in the prompt and any entropy reduction method. Or if you have access to an 80 gig card, ablate the 120b.

4 months ago 3 0 0 0

now that i have sold out and started working on these: it is because the big labs figured out local entropy reduction techniques are very effective, and they aggressively tune that knob

4 months ago 3 0 1 0
Preview
WeatherNext 2: Our most advanced weather forecasting model The new AI model delivers more efficient, more accurate and higher-resolution global weather predictions.

completely unrelatedly, i am now ~fully convinced that there isn't a single real world smooth mapping that you can't capture by diffusing the correct amount in the correct space

blog.google/technology/g...

5 months ago 4 0 0 0

at the same time, different channels will have different overall power spectra (that a full rank representation preserves) and so good latents must be doing some sort of spatial mixing directly, and the diffusion models must untangle that and step *down* in dimensionality while increasing dof

7 months ago 0 0 0 0
Advertisement

because any information noised in the forward process cannot be seen later, these models always encode a hierarchical series of representations. but latent space is much closer to full rank than the target data manifold (a perfect one would be exactly full rank)

7 months ago 2 0 1 0

there is. in the continuous limit the models learn the target score of the conditional distribution. but the forward process is a gaussian perturbation kernel so the step between any two diffusion times is a white noise, so high frequency modes must drop (exponentially) faster

7 months ago 0 0 1 0
Post image

lmao that these might print

10 months ago 0 0 0 0

this is easily one of the top 3 worst trades in nba history. david stern is furiously fighting his way out of hell to stop this

1 year ago 0 0 0 0

shoutout to deepseek, showing you can just bolt on CoT with direct rl if your base model is good enough

1 year ago 0 0 0 0

california basically needs to remove its entire regulatory state at this point or the people are going to elect democrat hitler

1 year ago 2 0 1 0

hard to say with insurance its very regulated and im not an insurance guy. the broad issues are fraud and states making it unprofitable to service. definitely more parametric structures in the policies, but idk if those are even legal to offer to consumers (mb bypasses california's idiot laws?)

1 year ago 1 0 1 0

there are already hurricane binary options, but there are some otc parametric structures (so like wind pressure, rainfall). i'm sure somebody has something similar for fire, but that is still otc. the big volume rn are temperature based contracts (LNG hedge)

1 year ago 1 0 1 0

unrelated: you see this paper openreview.net/pdf?id=gojL6...

if this works on fluid dynamics im gonna lose my shit

1 year ago 1 0 0 0

resolving to reply to more posts with lol in 2025

1 year ago 1 0 0 0
Advertisement

this person is going to the reeducation camps when i take control

1 year ago 0 0 0 0

i thought about doing something like this for fusion operations with triton or MLIR, but i think that's actually just a full phd topic of work because i'd need to develop some sort of proof engine for it

1 year ago 1 0 0 0

some lab needs to give me 50000 h200s so i can implement an implicit runge kutta token sampler that costs 3.5 million dollars per inference run and outputs "i don't feel like doing that right now" 50% of the time

1 year ago 8 0 0 0

if your children don't venerate Urkel thought theyre ngmi

1 year ago 12 0 0 0

yeah it's basically greenfield and its the sort of problem where throwing money into a furnace gets you better solutions for a while

1 year ago 6 0 1 0