Anthony Gitter (@anthonygitter) Bsky

Lessons Learned from the OpenADMET-ExpansionRx Blind Challenge: Can We Trust Zero-Shot ADMET Predictions? Maria Castellanos Hugo MacDermott-Opeskin It’s been more than a month since the OpenADMET-ExpansionRx challenge wrapped up, but the conversation is just getting started. Launched on October 27, 2025...

Fantastic analysis from the OpenADMET team (Maria Castellanos, Hugo MacDermott-Opeskin) showing that the zero-shot ADMET models ADMETlab 3.0 and ADMET-AI generalize poorly to their recent OpenADMET-ExpansionRx Blind Challenge data openadmet.ghost.io/zero-shot-ex...

2 weeks ago 0 0 0 0

The Critical Assessment of Structure Prediction (CASP) experiment is calling for prediction targets: Immune Complexes, Organic Ligand-Protein Complexes, Nucleic Acids and Complexes, Conformational Ensembles, Difficult Protein Structures and Complexes. Rule of Thumb: If AlphaFold3 can generate a high-quality model, it is likely not a CASP-grade challenge. If it struggles, we want it.

Is #AI hitting a plateau in structure prediction? Help us find out at CASP17! 🧪🧬

Calling for Targets: Immune Complexes, protein - ligand complexes, RNA/DNA, conformational ensembles, membrane proteins, viral origins, and large complexes.

The Rule of Thumb: If AF3 can’t model it, we want it.

1 month ago 48 35 2 3

Mapping the yeast atructural interactome with AlphaFold3: an open call for collaboration We are excited to announce the early-stage release of our S. cerevisiae structural interactome mapping project. Using AlphaFold3 (AF3), w...

We have started a project trying to predic the interactions/structures of all yeast protein pairs using an AlphaFold pooling approach. We are making the current dataset open and we welcome collaborations.
www.evocellnet.com/2026/03/mapp...

1 month ago 97 53 6 0

Can we simulate realistic evolutionary trajectories and “replay the tape of life”? In this work, we propose a flexible, generalizable deep learning framework for modeling how the entire protein sequence evolves over time while capturing complex interactions across sites. 1/n
doi.org/10.64898/202...

2 months ago 83 35 3 1

👋 from the Nexus

I still haven't built up my network here so my following patterns are a narrow slice of my interests.

2 months ago 1 0 1 0

Ancient amino acid sets enable stable protein folds Early proteins likely arose from a chemically limited set of amino acids available through prebiotic chemistry, raising a central question in molecular evolution: could such primitive compositions yie...

Can proteins fold and function with half of the amino acid alphabet?
Using only 10 residues, we designed stable, mutation-resilient structures—no aromatics or basics involved.
A minimalist foundation for ancient biology and synthetic design. tinyurl.com/37t8br4v
#ProteinDesign #OriginsOfLife

5 months ago 25 11 1 0

Mingchen replied to me on Twitter that it's also on bioRxiv now www.biorxiv.org/content/10.6...

2 months ago 2 0 0 0

Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning Mirdita Lab builds scalable bioinformatics methods.

My time in @martinsteinegger.bsky.social's group is ending, but I’m staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org

3 months ago 104 55 7 1

Know when to co-fold'em This is the official web page for the James Fraser Lab at UCSF.

I'm really excited to break up the holiday relaxation time with a new preprint that benchmarks AlphaFold3 (AF3)/“co-folding” methods with 2 new stringent performance tests.

Thread below - but first some links:
A longer take:
fraserlab.com/2025/12/29/k...

Preprint:
www.biorxiv.org/content/10.6...

3 months ago 72 30 5 2

New preprint🚨
Imagine (re)designing a protein via inverse folding. AF2 predicts the designed sequence to a structure with pLDDT 94 & you get 1.8 Å RMSD to the input. Perfect design?
What if I told u that the structure has 4 solvent-exposed Trp and 3 Pro where a Gly should be?

Why to be wary🧵👇

4 months ago 63 24 4 1

GitHub - AnantharamanLab/protein_set_transformer: Protein Set Transformer (PST) framework for training protein-language-model-based genome language models. Inference is possible for viral genomes usin... Protein Set Transformer (PST) framework for training protein-language-model-based genome language models. Inference is possible for viral genomes using our pretrained viral foundation model. - Anan...

Cody also put in a ton of extra work to make the code organized and usable in the GitHub repo: github.com/Anantharaman...

It links to a Colab notebook for model inference, training data, and pretrained models.

4 months ago 1 1 0 0

Protein Set Transformer: a protein-based genome language model to power high-diversity viromics - Nature Communications A genome language model, Protein Set Transformer, trained on viral datasets, uncovers evolutionary rules of protein content and organization driving precise virus identification, host prediction, and ...

Excited for our new paper on a genome language model for viruses in @natcomms.nature.com: "Protein Set Transformer: a protein-based genome language model to power high-diversity viromics"! Led by PhD student Cody Martin in collaboration with @anthonygitter.bsky.social

doi.org/10.1038/s414...

4 months ago 10 4 1 0

Thanks, I didn't realize Rogue Scholar minted DOIs

4 months ago 0 0 0 0

Use @prereview.bsky.social for preprints and something else for other manuscripts?

4 months ago 1 0 0 0

What are good places to post an unsolicited manuscript peer review these days? I don't have a blog. I read manuscripts across arXiv, bioRxiv, ChemRxiv, OpenReview, random white papers, journals, etc. Do I dump it on Zenodo, post it here, and send it to the authors?

4 months ago 2 1 2 0

Our Assay2Mol manuscript was published at EMNLP 2025 doi.org/10.18653/v1/...

See the preprint thread below for a summary of the methodology, results, and code. We added more control experiments in this version related to protein sequence identity and generated molecule size.

5 months ago 0 0 0 0

@hkws.bsky.social and I are creating the Madison AI for Proteins (MAIP) group to discuss early-stage research at monthly meetups, share computational resources, and grow this local community. Visit mad-ai-proteins.github.io to sign up for announcements and watch for our 2026 events.

5 months ago 1 0 0 0

This looks like a fantastic resource to study human kinase signalling. So much MS instrument time.

5 months ago 13 3 0 0

Something fun and sciencey is coming soon to Madison

5 months ago 0 0 0 1

Protein Language Model Fitness Is a Matter of Preference Leveraging billions of years of evolution, scientists have trained protein language models (pLMs) to understand the sequence and structure space of proteins aiding in the design of more functional pro...

Looks very interesting. Can I think of this like a more extreme form of the evotuning from UniRep or doi.org/10.1101/2024... except it uses one sequence instead of the sequence plus homologs?

5 months ago 3 0 1 0

MPAC Multi-omic Pathway Analysis of Cells (MPAC), integrates multi-omic data for understanding cellular mechanisms. It predicts novel patient groups with distinct pathway profiles as well as identifying ke...

Bioconductor R package: bioconductor.org/packages/MPAC

Shiny app to explore results in manuscript: connect.doit.wisc.edu/content/122/

6 months ago 0 0 0 0

MPAC uses PARADIGM as the probabilistic model but makes many improvements:
- data-driven omic data discretization
- permutation testing to eliminate spurious predictions
- full workflow and downstream analyses in an R package
- Shiny app for interactive visualization

6 months ago 0 0 1 0

Overview of the MPAC workflow. MPAC calculates inferred pathway levels (IPLs) from real and permuted CNA and RNA data. It filters real IPLs using the permuted IPLs to remove spurious IPLs. Then, MPAC focuses on the largest pathway subset network with filtered IPLs to compute GO term enrichment, predict patient groups, and identify key group-specific proteins.

The journal version of our Multi-omic Pathway Analysis of Cells (MPAC) software is now out: doi.org/10.1093/bioi...

MPAC uses biological pathway graphs to model DNA copy number and gene expression changes and infer activity states of all pathway members.

6 months ago 2 1 1 0

🧬 Use ESMFold Online | Neurosnap Bulk protein structure prediction model that only requires a single amino acid sequence as input. Much faster than AlphaFold2 since no MSAs are required (but slightly less accurate too).

I found out that Neurosnap offers ESMFold via API neurosnap.ai/service/ESMF...

I may test how many calls are possible with the free academic plan to see if it is worthwhile to update my repo.

6 months ago 2 1 0 0

Biophysics-based protein language models for protein engineering - Nature Methods Mutational effect transfer learning (METL) is a protein language model framework that unites machine learning and biophysical modeling. Transformer-based neural networks are pretrained on biophysical simulation data to capture fundamental relationships between protein sequence, structure and energetics.

AI + physics for protein engineering 🚀
Our collaboration with @anthonygitter.bsky.social is out in Nature Methods! We use synthetic data from molecular modeling to pretrain protein language models. Congrats to Sam Gelman and the team!
🔗 www.nature.com/articles/s41...

6 months ago 5 1 0 0

Does anyone know whether there's a functioning API to ESMfold?

(api.esmatlas.com/foldSequence... gives me Service Temporarily Unavailable)

6 months ago 3 1 2 0

GitHub - gitter-lab/metl: Mutational Effect Transfer Learning (METL) framework for pretraining and finetuning biophysics-informed protein language models Mutational Effect Transfer Learning (METL) framework for pretraining and finetuning biophysics-informed protein language models - gitter-lab/metl

The main GitHub repo github.com/gitter-lab/m... links to the extensive resources for running Rosetta simulations at scale to generate new training data, training METL models, running our models, and accessing our datasets. 8/

7 months ago 0 0 0 0

Fig. 6: Low-N GFP design.

We can use METL for low-N protein design. We trained METL on Rosetta simulations of GFP biophysical attributes and only 64 experimental examples of GFP brightness. It designed fluorescent 5 and 10 mutants, including some with mutants entirely outside training set mutations. 7/

7 months ago 0 0 1 0

Fig. 5: Function-specific simulations improve METL pretraining for GB1.

A powerful aspect of pretraining on biophysical simulations is that the simulations can be customized to match the protein function and experimental assay. Our expanded simulations of the GB1-IgG complex with Rosetta InterfaceAnalyzer improve METL predictions of GB1 binding. 6/

7 months ago 0 0 1 0

Fig. 3: Comparative performance across extrapolation tasks.

We also benchmark METL on four types of difficult extrapolation. For instance, positional extrapolation provides training data from some sequence positions and tests predictions at different sequence positions. Linear regression completely fails in this setting. 5/

7 months ago 0 0 1 0

Posts by Anthony Gitter