Christian Dallago (@machine.learning.bio) Bsky

AlphaFold database has entered the era of complexes. Together with NVIDIA, DeepMind and EBI, we use ColabFold, OpenFold and MMseqs2-GPU to predict ~31 million complexes (homo & hetro-dimers) resulting in 1.8 million high-quality predictions
📄 research.nvidia.com/labs/dbr/ass...
🌐 alphafold.ebi.ac.uk

1 month ago 265 111 8 3

You asked, we listened. Millions of AI-predicted protein complex structures are now available in the #AlphaFold Database.

This spans homodimers from 20 of the most studied species, including humans, as well as the World Health Organization’s priority pathogens list.

www.ebi.ac.uk/about/news/t...

1 month ago 157 86 7 4

FLIP2: Expanding Protein Fitness Landscape Benchmarks FLIP2: A comprehensive benchmark for protein fitness prediction with 7 datasets, 16 splits, and real-world engineering scenarios

Find out more: flip.protein.properties

1 month ago 2 1 0 0

I'm especially happy about continuing to work with an amazing group of scientists. Thanks @kevinkaichuang.bsky.social , @kdidi.bsky.social , Bruce Wittmann, @kadinaj.bsky.social, Maya Czeneszew, @sarahalamdari.bsky.social, @alexijie.bsky.social, @thisismadani.bsky.social, ++

1 month ago 1 0 1 0

I love this gritty work here; there are no new architectures, no leaderboard-topping number to screenshot. However, it's how we (and hopefully the community) can measure whether the models we're all building and using are getting better where it counts — at the bench.

1 month ago 1 0 1 0

That's not a criticism to any method. New benchmarks are precisely needed to see where we are at, and set some target of where we could go from here.

1 month ago 1 0 1 0

Especially on the wild-type and position splits, current transfer learning doesn't consistently win. No single pLM architecture dominates. Scaling hasn't closed the gap yet.

1 month ago 1 0 1 0

The answer in 2026 is largely the same. Simple ridge regression on one-hot sequences, optionally supplemented with zero-shot pLM likelihoods, often matches or outperforms fine-tuned protein language models.

1 month ago 2 1 1 0

FLIP2, adds seven new sequence-fitness landscapes - industrial enzymes, nucleases, rhodopsins, protein-protein interactions - and 16 splits that test the generalization axes protein engineers really hit: more mutations, new positions, higher fitness, different wild-types.

1 month ago 1 0 1 0

We were interested in how things had changed 5 years after our first release. So, we built FLIP2 on select datasets from great labs across the world, many of which have gracefully agreed to make their data freely available.

1 month ago 1 0 1 0

FLIP spawned fast development of several different benchmarking efforts across protein design, engineering, and variant effect assesment.

The answer in 2021 was: sometimes, but simpler models hold up surprisingly well.

1 month ago 1 0 1 0

Five years ago, we released FLIP. The core question was: can ML models for protein fitness prediction generalize in the ways that actually matter for protein engineering, i.e. low data, extrapolation to more mutations, out-of-distribution sequences?

1 month ago 4 5 2 0

We made FLIP2, a protein fitness benchmark spanning seven new datasets, including enzymes, protein-protein interactions, and light-sensitive proteins, as well as splits that measure generalization relevant to real-world protein engineering campaigns.

1 month ago 52 15 1 1

You can use the model right now to freely generate families for single sequence inputs (i.e., diversification conditioned by intrinsic representations of evolution), or to engineer proteins based on family promts (diversification by conditioning on particular evoluationary trajectories).

3 months ago 1 1 0 0

In essence, we probed the model's ability to ricapitulate family statistics, bootstrap protein structure prediction, and assess mutation effect, demonstrating excellent performance across all tasks, especially using test-time-scaling via prompt conditioning.

3 months ago 0 0 1 0

With ProFam-1, we scaled learning from single sequence to protein family definitions of different kinds, curating a large protein family corpus, ProtFam-atlas. I'm particularily stoked about the idea of inference-time-compute. This contribution laid out a very exciting path for future work.

3 months ago 0 0 1 0

Our latest protein family-based GenAI collection of tools and datasets, ProFam, is out now. Everything -- from data, training and inference code, to a 215M llama-based ProFam-1 are fully open sourced.

🧵

3 months ago 5 1 2 0

Tenure-Track Assistant Professor Position –AI/ML for Cell Biology - Durham, North Carolina (US) job with Duke University School of Medicine | 12844591 Tenure-Track Assistant Professor Position –AI/ML for Cell Biology

Another exciting opportunity, this time as a colleague at Duke! Join as tenure track assistant prof. in Cell Bio & let’s work on closing the gap between in-silico and in-vivo: www.nature.com/naturecareer...

Important: application closes Nov 1st!!!

5 months ago 3 1 0 0

Senior Applied Research Scientist, Multiscale Biology | NVIDIA Corporation Apply your expertise in engineering biology through algorithms and tools for genes, tissues, organisms, and populations. Conduct collaborative applied research in multiscale biology using deep learnin...

Another opening: Senior Multiscale Biology Applied Research Scientist!

nvidia.eightfold.ai/careers/job/...

Are fascinated by fundamental data modalities across biology like RNA-seq, mass spec & want to build computational tools that harnessing data to build intelligence?

Come: join the team!

6 months ago 4 1 0 0

I don’t dare question the HR gods about their designs :)

6 months ago 0 0 1 0

Senior Applied Research Scientist, Bioinformatics | NVIDIA Corporation Lead applied and collaborative research programs using bioinformatics, high performance computing, and deep learning for biological advancements. Develop and accelerate bioinformatics software and alg...

Are you passionate about leading collaborative, fast moving, applied bioinformatics research projects that help the entire community move forward?

Apply to work in my team at NVIDIA: nvidia.eightfold.ai/careers/job/...

6 months ago 5 0 1 0

I should add: structure prediction inference is INCREDIBLY efficient for the form factor and power profile.

6 months ago 0 0 0 0

It's an inference monster.

Structure prediction on it works.

Design to come.

This will be updated later today... research.nvidia.com/labs/dbr/ass...

6 months ago 3 0 1 0

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new pred...

Great talk by @machine.learning.bio at the 5th Virtual @chembiotalks.bsky.social. He talked about the use of #MachineLearning and #BigData to address biological questions. Cool insights into both predicting functions and designing proteins
ieeexplore.ieee.org/document/947...
arxiv.org/abs/2503.00710

6 months ago 6 2 0 0

GPU-accelerated homology search with MMseqs2 - Nature Methods Graphics processing unit-accelerated MMseqs2 offers tremendous speedups for homology retrieval from metagenomic databases, query-centered multiple sequence alignment generation for structure predictio...

GPU-accelerated MMseqs2 offers tremendous speedup for homology retrieval, protein structure prediction with ColabFold, and protein structure search with Foldseek. @martinsteinegger.bsky.social @milot.bsky.social @machine.learning.bio

www.nature.com/articles/s41...

7 months ago 81 21 0 0

From AlphaFold to MMseqs2-GPU: How AI is Accelerating Protein Science Podcast Episode · NVIDIA AI Podcast · 09/10/2025 · 35m

podcasts.apple.com/us/podcast/f...

7 months ago 10 3 0 1

Looking forward to hearing about the potential of machine learning for #Biology and #DrugDiscovery from an industry perspective. Register for the Virtual @chembiotalks.bsky.social to hear the perspective of Chris Dallago (@machine.learning.bio) from Nvidia.
#ChemBio #Chemsky #ML #MachineLearning

9 months ago 10 5 0 0

I still feel criminal for the handle but the manuscript embodies it well.

9 months ago 2 0 0 0

Moore’s law applied to speed not accuracy. I don’t think fundamentally the discoveries we are after are entirely dependent on speed.

I think the better law here is garbage in garbage out.

In that sense, you can wait for better data/curation, but it’s also fun to take destiny in your own hands :)

9 months ago 2 0 1 0

Computational exploration of global venoms for antimicrobial discovery with Venomics artificial intelligence - Nature Communications Researchers used artificial intelligence to mine global venom proteomes and discovered novel peptides with antimicrobial activity. Several candidates showed efficacy against drug-resistant bacteria in...

(1/5) Venoms are a vast, largely untapped library of bioactive molecules—and our new paper in @natcomms.nature.com ‬ @natprot.nature.com reveals just how powerful they can be. 🐍⚡️

9 months ago 4 3 1 0

Posts by Christian Dallago