Fantastic analysis from the OpenADMET team (Maria Castellanos, Hugo MacDermott-Opeskin) showing that the zero-shot ADMET models ADMETlab 3.0 and ADMET-AI generalize poorly to their recent OpenADMET-ExpansionRx Blind Challenge data openadmet.ghost.io/zero-shot-ex...
Posts by Anthony Gitter
The Critical Assessment of Structure Prediction (CASP) experiment is calling for prediction targets: Immune Complexes, Organic Ligand-Protein Complexes, Nucleic Acids and Complexes, Conformational Ensembles, Difficult Protein Structures and Complexes. Rule of Thumb: If AlphaFold3 can generate a high-quality model, it is likely not a CASP-grade challenge. If it struggles, we want it.
Is #AI hitting a plateau in structure prediction? Help us find out at CASP17! 🧪🧬
Calling for Targets: Immune Complexes, protein - ligand complexes, RNA/DNA, conformational ensembles, membrane proteins, viral origins, and large complexes.
The Rule of Thumb: If AF3 can’t model it, we want it.
We have started a project trying to predic the interactions/structures of all yeast protein pairs using an AlphaFold pooling approach. We are making the current dataset open and we welcome collaborations.
www.evocellnet.com/2026/03/mapp...
Can we simulate realistic evolutionary trajectories and “replay the tape of life”? In this work, we propose a flexible, generalizable deep learning framework for modeling how the entire protein sequence evolves over time while capturing complex interactions across sites. 1/n
doi.org/10.64898/202...
👋 from the Nexus
I still haven't built up my network here so my following patterns are a narrow slice of my interests.
Can proteins fold and function with half of the amino acid alphabet?
Using only 10 residues, we designed stable, mutation-resilient structures—no aromatics or basics involved.
A minimalist foundation for ancient biology and synthetic design. tinyurl.com/37t8br4v
#ProteinDesign #OriginsOfLife
Mingchen replied to me on Twitter that it's also on bioRxiv now www.biorxiv.org/content/10.6...
My time in @martinsteinegger.bsky.social's group is ending, but I’m staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org
I'm really excited to break up the holiday relaxation time with a new preprint that benchmarks AlphaFold3 (AF3)/“co-folding” methods with 2 new stringent performance tests.
Thread below - but first some links:
A longer take:
fraserlab.com/2025/12/29/k...
Preprint:
www.biorxiv.org/content/10.6...
New preprint🚨
Imagine (re)designing a protein via inverse folding. AF2 predicts the designed sequence to a structure with pLDDT 94 & you get 1.8 Å RMSD to the input. Perfect design?
What if I told u that the structure has 4 solvent-exposed Trp and 3 Pro where a Gly should be?
Why to be wary🧵👇
Cody also put in a ton of extra work to make the code organized and usable in the GitHub repo: github.com/Anantharaman...
It links to a Colab notebook for model inference, training data, and pretrained models.
Excited for our new paper on a genome language model for viruses in @natcomms.nature.com: "Protein Set Transformer: a protein-based genome language model to power high-diversity viromics"! Led by PhD student Cody Martin in collaboration with @anthonygitter.bsky.social
doi.org/10.1038/s414...
Thanks, I didn't realize Rogue Scholar minted DOIs
Use @prereview.bsky.social for preprints and something else for other manuscripts?
What are good places to post an unsolicited manuscript peer review these days? I don't have a blog. I read manuscripts across arXiv, bioRxiv, ChemRxiv, OpenReview, random white papers, journals, etc. Do I dump it on Zenodo, post it here, and send it to the authors?
Our Assay2Mol manuscript was published at EMNLP 2025 doi.org/10.18653/v1/...
See the preprint thread below for a summary of the methodology, results, and code. We added more control experiments in this version related to protein sequence identity and generated molecule size.
@hkws.bsky.social and I are creating the Madison AI for Proteins (MAIP) group to discuss early-stage research at monthly meetups, share computational resources, and grow this local community. Visit mad-ai-proteins.github.io to sign up for announcements and watch for our 2026 events.
This looks like a fantastic resource to study human kinase signalling. So much MS instrument time.
Something fun and sciencey is coming soon to Madison
Looks very interesting. Can I think of this like a more extreme form of the evotuning from UniRep or doi.org/10.1101/2024... except it uses one sequence instead of the sequence plus homologs?
Bioconductor R package: bioconductor.org/packages/MPAC
Shiny app to explore results in manuscript: connect.doit.wisc.edu/content/122/
MPAC uses PARADIGM as the probabilistic model but makes many improvements:
- data-driven omic data discretization
- permutation testing to eliminate spurious predictions
- full workflow and downstream analyses in an R package
- Shiny app for interactive visualization
Overview of the MPAC workflow. MPAC calculates inferred pathway levels (IPLs) from real and permuted CNA and RNA data. It filters real IPLs using the permuted IPLs to remove spurious IPLs. Then, MPAC focuses on the largest pathway subset network with filtered IPLs to compute GO term enrichment, predict patient groups, and identify key group-specific proteins.
The journal version of our Multi-omic Pathway Analysis of Cells (MPAC) software is now out: doi.org/10.1093/bioi...
MPAC uses biological pathway graphs to model DNA copy number and gene expression changes and infer activity states of all pathway members.
I found out that Neurosnap offers ESMFold via API neurosnap.ai/service/ESMF...
I may test how many calls are possible with the free academic plan to see if it is worthwhile to update my repo.
AI + physics for protein engineering 🚀
Our collaboration with @anthonygitter.bsky.social is out in Nature Methods! We use synthetic data from molecular modeling to pretrain protein language models. Congrats to Sam Gelman and the team!
🔗 www.nature.com/articles/s41...
Does anyone know whether there's a functioning API to ESMfold?
(api.esmatlas.com/foldSequence... gives me Service Temporarily Unavailable)
The main GitHub repo github.com/gitter-lab/m... links to the extensive resources for running Rosetta simulations at scale to generate new training data, training METL models, running our models, and accessing our datasets. 8/
Fig. 6: Low-N GFP design.
We can use METL for low-N protein design. We trained METL on Rosetta simulations of GFP biophysical attributes and only 64 experimental examples of GFP brightness. It designed fluorescent 5 and 10 mutants, including some with mutants entirely outside training set mutations. 7/
Fig. 5: Function-specific simulations improve METL pretraining for GB1.
A powerful aspect of pretraining on biophysical simulations is that the simulations can be customized to match the protein function and experimental assay. Our expanded simulations of the GB1-IgG complex with Rosetta InterfaceAnalyzer improve METL predictions of GB1 binding. 6/
Fig. 3: Comparative performance across extrapolation tasks.
We also benchmark METL on four types of difficult extrapolation. For instance, positional extrapolation provides training data from some sequence positions and tests predictions at different sequence positions. Linear regression completely fails in this setting. 5/