lol
Posts by scoff manifesto
yeah. it's the same as lifting your nonlinear dynamics into higher dimensions to linearize. you cross your fingers and hope that your eigenbasis isn't continuous so some discrete, fiinite representation is good enough
for llms though, you explicitly break markov assumptions during postraining
as in the result of the paper (scaling your encoder/decoder gets you ~nothing)? that seems entirely implausible to me! or at least it shouldn’t be true when you’re simulating actual physics instead of human recognizable video
a big issue here is (imo) the models aren’t well calibrated so i’m not entirely sold on even fischer as a metric that means much of anything if you use any sort of real model
yes (and it usually doesn't replicate lol). i am currently in complete disbelief of this (empirical!) result. i straight up do not believe this is possible.
arxiv.org/abs/2501.09755
is that super surprising? at low precision all these kl methods go crazy unless you're very very careful.
this metric is going to perfectly match the hessian of the parameters with respect to your loss function if the model is perfectly calibrated (you don't have the exact perfect distribution, but you have the first and second moments of the score)
so you instead look at the tangent bundles on each point of your manifold (the reimannian). the fischer metric is one where you do all this using the KL notion of distance, and it turns out doing this for your tangent bundle is unique up to a scaling.
so you have some family distributions θ that live on your manifold Θ. you want to equip this manifold with some sort of inner product so you can do all the things you like to do with when you have an inner product. you also want to make this not rely on your choice of parameterization.
any specific goal with this analysis in mind? ive become sort of blackpilled on all these metrics i think they are only narrowly useful
the fischer of the weights tells you how much information they carry. also tells you what perturbations your model is most sensitive to in a KL sense. there are many metrics you could chose but fischer is the ~unique reimannian metric. for a perfectly calibrated model it is exactly the hessian
you can check this yourself on gpt-oss 20b with High reasoning in the prompt vs low reasoning in the prompt and any entropy reduction method. Or if you have access to an 80 gig card, ablate the 120b.
now that i have sold out and started working on these: it is because the big labs figured out local entropy reduction techniques are very effective, and they aggressively tune that knob
completely unrelatedly, i am now ~fully convinced that there isn't a single real world smooth mapping that you can't capture by diffusing the correct amount in the correct space
blog.google/technology/g...
at the same time, different channels will have different overall power spectra (that a full rank representation preserves) and so good latents must be doing some sort of spatial mixing directly, and the diffusion models must untangle that and step *down* in dimensionality while increasing dof
because any information noised in the forward process cannot be seen later, these models always encode a hierarchical series of representations. but latent space is much closer to full rank than the target data manifold (a perfect one would be exactly full rank)
there is. in the continuous limit the models learn the target score of the conditional distribution. but the forward process is a gaussian perturbation kernel so the step between any two diffusion times is a white noise, so high frequency modes must drop (exponentially) faster
lmao that these might print
this is easily one of the top 3 worst trades in nba history. david stern is furiously fighting his way out of hell to stop this
shoutout to deepseek, showing you can just bolt on CoT with direct rl if your base model is good enough
california basically needs to remove its entire regulatory state at this point or the people are going to elect democrat hitler
hard to say with insurance its very regulated and im not an insurance guy. the broad issues are fraud and states making it unprofitable to service. definitely more parametric structures in the policies, but idk if those are even legal to offer to consumers (mb bypasses california's idiot laws?)
there are already hurricane binary options, but there are some otc parametric structures (so like wind pressure, rainfall). i'm sure somebody has something similar for fire, but that is still otc. the big volume rn are temperature based contracts (LNG hedge)
unrelated: you see this paper openreview.net/pdf?id=gojL6...
if this works on fluid dynamics im gonna lose my shit
resolving to reply to more posts with lol in 2025
this person is going to the reeducation camps when i take control
i thought about doing something like this for fusion operations with triton or MLIR, but i think that's actually just a full phd topic of work because i'd need to develop some sort of proof engine for it
some lab needs to give me 50000 h200s so i can implement an implicit runge kutta token sampler that costs 3.5 million dollars per inference run and outputs "i don't feel like doing that right now" 50% of the time
if your children don't venerate Urkel thought theyre ngmi
yeah it's basically greenfield and its the sort of problem where throwing money into a furnace gets you better solutions for a while