In practice that means use Omol or Omat standards unless you really need to use something different I guess. As those are the biggest data sets out there?
Posts by Tim Duignan
When generating training data on specific systems the MLIP/NNP community needs to get organized and agree on one particular level of theory, settings and data storage standards as much as possible so we can pool all the data for training foundation/universal models right?
I have no idea how long this period will last. But it may go faster than we realise. I have no idea what comes after. But I know I really don't want to miss it.
We may be just at the beginning of a period where individual scientists working with a team of AI agents will be able to get an immense amount done. Particularly for computational tasks that don't require physical experiments.
When agentic AI started popping up I remember thinking, yeah that is the logical next step but we're not there yet. Its a several years away, well I think its here now and it came much faster then I expected. It's still early days but seems it will have a profound impact.
We're beginning to understand the properties and formation of this crucially important complex material (the SEI) in great detail using NNPs/MLIPs. Now we can start using that knowledge to really design and engineer it to perfection. Exciting times ahead!
Then another one from De Angelis et al. showing a very interesting collective ring diffusion process involving six lithium ions in LiF a key component of the SEI. (chemrxiv.org/engage/chemr...)
I actually observed these pairs in water a while ago with an NNP and worried a lot that they were a hallucination as it is very counter intuitive that two such small cations would pair like that. But looks like it’s real, and really important! (iopscience.iop.org/article/10.1...)
James Stevenson et al. show (with experimental evidence) that lithium cations can pair up in the battery solvent and that these pairs are what form the SEI. (chemrxiv.org/engage/chemr...)
The SEI is the thin layer that forms between the surface of the graphite electrode and the liquid electrolyte in a battery and without it lithium ion batteries wouldn’t be possible as the graphite quickly exfoliates. Despite the its importance we know very little about it.
Couple of very nice new papers on understanding the SEI formation in lithium ion batteries using neural network/machine learning interatomic potentials (NNPs/MLIPs).
Universal machine learning forcefields beating tailor made classical potentials for zeolites quite convincingly. Great to see all these benchmarking papers! Again demonstrates accurate training data + speed should be key focus now.
arxiv.org/abs/2509.07417
That’s basically the same machine-learning problem NNPs are already solving and theres already many demonstrations they work well for this purpose. Some tricky problems like coupling the coarse-grained and all-atom levels remain, but that seems solvable.
But yeah at some point you have to use the nano second scale simulations to train coarse grained models integrate out the short range high frequency motions and learn the free energy surface.
Which means you can brute force carbonic anhydrase and potassium ion channels etc. easily which are at the fast end admittedly but then with some smart enhanced sampling like replica exchange/meta dynamics it should get you the rest of the way to many important discoveries.
With distilled, optimized NNPs, and new generation of GPU clusters, should be able to approach classical speeds (100s ns/day).
I think neural network potentials are the eventual pathway to a virtual cell. The accuracy/memory are quite close to where you need them. Timescale is the last real hurdle. But we can port decades of great tools from classical FFs, so it’s becoming more of an engineering problem now.
arxiv.org/pdf/2508.15614 This is right and it's a big deal. Been waiting my whole career for this point. So many things to simulate!
Another very interesting benchmarking paper on NNPs. lnkd.in/gWbcTQw8 It seems the models are pretty much there. Very exciting times as these new large datasets continue to be built. Always need more though!
There’s examples of how to run Md in the examples folder github.com/orbital-mate... to get a solvated protein there download the pdb and use PDB2PQR to add hydrogens and solvent with parmed or find a classical Md paper where they’ve already done this
Nice recent example this is an important problem for Pharma: chemrxiv.org/engage/chemr...
The new models should be much better at this
MD= molecular dynamics and is basically just a direct simulation of how the molecules behave we haven't been able to do this accurately for any interesting systems accurately enough until now, which we now can because of these models.
The main limitation is there are many processes that occur on too long a timescales but we can build big molecular dynamics data sets with this model and then train coarse grained models on those to get to the longer time/spatial scales.
Many people are already doing amazing science with custom built NNPs for many substances. The idea is now they can skip making the training data and building the model and go straight to doing science.
It kind of has too many applications to list. But generally for any substance you want to know its structural, kinetic and thermodynamic properties. All of them can be derived from MD in principle.
For generations people have been dreaming of simulating real complex chemistry like enzymes and MOFs starting from nothing but quantum mechanics. Well I really think it's here.
This means it should be possible to look at crucially important angstrom/picosecond scale phenomena like the hydrogen bond network of water and how it controls reaction dynamics etc. where we really lack adequate tools.
I'm particularly excited about how close the structure stays to experiment, with no constraints, even though the training data doesn't contain a single protein.
And you can look at huge systems with them. Like the 20,000 atoms solvated carbonic anhydrase enzyme with many different complex interactions going on. You can do hundreds of thousands of calculations on it with a single GPU in a few days, with no unphysical behaviour.
It's amazing to me that you can just pick general purpose DFT validation sets and benchmark them like they are a DFT functional and they will normally do a great job out of the box. Often similar to a dispersion corrected GGA or better but orders of magnitude faster.