The Machine Learning for Fundamental Physics School is back! This time bigger and in a new location: Georgia Institute of Technology (@gtsciences.bsky.social)!
We can cover the domestic travel and accommodation of most students and ECRs participating. Apply soon!
indico.global/event/17000/
Posts by William Gilpin
Delighted this paper is out! Soft solids fracture in complex ways. Can we control it using structure and activity? Yes, using defects that localize energy injection for targeted failure! Amazing work combining exp, theory & ML by Sheng Chen and collab with Murrell lab (Yale).
GitHub: github.com/williamgilpi...
Preprint: arxiv.org/abs/2602.18679
Joint work with Anthony Bao and Jeff Lai
The work was inspired by recent works on small transformers, which isolate surprising behaviors like in-context k-nearest neighbors, and even in-context learning of small MLPs. See Garg et a 2022l: arxiv.org/abs/2208.01066 and Reddy 2024: arxiv.org/abs/2312.03002 (10/N)
You can think of our setting as a minimized version of SciML foundation models (FM) do. For example, PDE FMs are trained at one Reynolds number, but can forecast different Reynolds numbers. Likewise, physiology FM zero-shot new subjects, who are likely distinct dynamical systems
The agreement improves as out-of-distribution loss drops. We repeated these experiments across a hundred different models trained on different systems, using the dysts ODE library. (8/N)
The results agree, suggesting that small transformers learn transfer operators from the context during testing. They especially capture longer-lived modes, like the invariant distribution and long-lived metastable states (leading eigenvectors) (7/N)
So transformers infer when tokens arise from a higher-dimensional attractor. How does this enable OOD forecasts? We sample transitions between pairs of k-grams (time-delay embedded inputs), and compare to transition probabilities on the original, full state space. (6/N)
How is this possible? We extract the conditional probabilities of the small transformer and find that it time delay embeds its univariate input. When test data comes from a higher dimensional system, attention rollouts become higher-rank, i.e. an adaptive delay embedding. (5/N)
We can also see this in the loss curves. We see epoch-wise double descent for our in-distribution test data (a different trajectory from the same ODE), but we see a second double descent for out-of-distribution data from an unseen ODE. (4/N)
in our new paper we shrink this phenomenon down: we fully-train a small Chronos-like transformer to forecast exactly one dynamical system, and then test its ability to forecast a second dynamical system. Even in this restricted setting, it works much better than expected. (3/N)
Last year, we noticed that off-the-shelf time series foundation models, which never saw ODE during training, forecast chaotic systems surprisingly well, even without fine-tuning. (2/N)
How do time series foundation models forecast unseen dynamical systems? In new experiments, we find that small transformers learn to approximate transfer operators in-context. (1/N)
arxiv.org/abs/2602.18679
RCSA welcomes 24 early career teacher-scholars in chemistry, physics, and astronomy as recipients of its 2026 #CottrellScholar Awards. Each awardee receives $120,000. Congratulations to this exceptional class!
Congratulations!
I’m very happy to announce that I will joining #Seoul National University’s #SNU #physics department as Assistant Professor in Fall of 2026!
🚨 POSTDOC OPENING 🚨
NIH-funded Bio-Fluid Mechanics Postdoc in my lab @univmiami.bsky.social
Hofstenia miamia | cilia-driven flows | behavior & neuroscience
Collab w/ Mansi Srivastava @harvard.edu
🕒 Start: Jan–Feb 2026
⏳ 1 yr, renewable | Email me ASAP!
#Postdoc #Biophysics #FluidDynamics
I'm excited to say that one of the most exploratory and thought-provoking papers I've worked on in recent years was just accepted at Physical Review Research.
Preprint here: arxiv.org/abs/2502.21072
#physics #innovation 🧪🦋 @apsphysics.bsky.social
Kudos to Edoardo Baldini, William Gilpin & Daehyeok Kim on earning Faculty Early Career Development Program (CAREER) Awards from the National Science Foundation!
#NSF #CAREERAwards #EarlyCareerDevelopment #TexasScience @wgilpin.bsky.social @utphysics.bsky.social
cns.utexas.edu/news/accolad...
hi david, thank you very much :)
Paper: doi.org/10.1371/jour...
Code: github.com/williamgilpi...
Explanatory Website & Code demo: williamgilpin.github.io/illotka/demo...
This work was inspired by amazing recent work on transients by the dynamical systems community: Analogue KSAT solvers, slowdowns in gradient descent during neural network training, and chimera states in coupled oscillators. (12/N)
For the Lotka-Volterra case, optimal coordinates are the right singular vectors of the species interaction matrix. You can experimentally estimate these with O(N) operations using Krylov-style methods: perturb the ecosystem, and see how it reacts. (11/N)
This variation influences how we reduce the dimensionality of biological time series. With non-reciprocal interactions (like predator prey), PCA won’t always separate timescales. The optimal dimensionality-reducing variables (“ecomodes”) should precondition the linear problem (10/N)
As a consequence of ill-conditioning, large ecosystems become excitable: small changes cause huge differences in how they approach equilibrium. Using the FLI, a metric invented by astrophysicists to study planetary orbits, we see caustics indicating variation in solve path (9/N)
How would hard optimization problems arise in nature? I used genetic algorithms to evolve ecosystems towards supporting more biodiversity, and they became more ill-conditioned—and thus more prone to supertransients. (8/N)
So ill-conditioning isn’t just something numerical analysts care about. It’s a physical property that measures computational complexity, which translates to super long equilibration times in large biological networks with trophic overlap (7/N)
More precisely: the expected equilibration time of a random Lotka-Volterra system scales with the condition number of the species interaction matrix. The scaling matches the expected scaling of the solvers that your computer uses to do linear regression (6/N)
We can think of ecological dynamics as an analogue constraint satisfaction problem. As the problem becomes more ill-conditioned, the ODEs describing the system take longer to “solve” the problem of who survives and who goes extinct (5/N)
But is equilibrium even relevant? In high dimensions, stable fixed points might not be reachable in finite time. Supertransients due to unstable solutions that trap dynamics for increasingly long durations. E.g, pipe turbulence is supertransient (laminar flow is globally stable) (4/N)