Milot Mirdita (@milot) Bsky

Fast and accurate multiple-protein-sequence alignment at scale with FAMSA2 - Nature Biotechnology FAMSA2 accurately aligns millions of protein sequences at high speed.

10 years after the first FAMSA paper, its successor is now published in Nat Biotech! We believe that FAMSA2 can enable analyses of large protein collections that were previously unattainable. Thank you, Andrzej and Cedric, for great collaboration
www.nature.com/articles/s41...

1 week ago 56 22 3 2

Metabuli & Metabuli App v1.2 improve novel species classification with higher precision and recall. New light mode is 1.8× faster and requires 50% less storage while keeping precision. New RefSeq, GTDB, HRGM, and HROM databases added.
💾 github.com/steineggerla...
📄 doi.org/10.64898/2026.03.13.711249

1 week ago 30 18 1 0

GitHub - gbouras13/baktfold: Rapid & standardized genome annotation using protein structural information Rapid & standardized genome annotation using protein structural information - gbouras13/baktfold

Whenever I presented Phold, I was frequently asked "can you do the same beyond phages?" We ( @oschwengers.bsky.social @linsalrob.bsky.social @binomicalabs.org et al) finally did it with Baktfold github.com/gbouras13/ba... www.biorxiv.org/content/10.6...

2 weeks ago 56 22 1 2

AFESM Clusters Foldseek clustered 820M AlphaFold DB + ESMatlas structures

45 novel protein folds in the updated AFESM (AFDB + ESMatlas) manuscript:
• 12 high-confidence folds in AFESM
• 33 by ColabFold-repredicting 2.3M low-quality domains
We show AFDB captures most domains already and ESMfold struggles with novelty
🌏 afesm.foldseek.com
📄 biorxiv.org/content/10.1...

2 weeks ago 20 9 1 0

Baktfold: Sensitive protein functional annotation across the microbial tree of life using structural information www.biorxiv.org/content/10.64898/2026.03...

2 weeks ago 3 3 0 0

BRIGX - Browser-Based Ring Image Generator Circular comparative genome visualization tool running entirely in your browser

the web application is available at :

https://brigx.genomicx.org/

I would be interested to hear feedback regarding bugs or any unexpected behaviour, any suggestions for the user interface, or any features you feel would be useful .

3 weeks ago 2 1 0 0

Clustering the protein universe of life using DIAMOND DeepClust - Nature Methods DIAMOND DeepClust provides an ultra-fast clustering method for organizing the protein universe of life at low sequence identity, enabling large-scale dimensionality reduction and improving downstream ...

Clustering proteins using DIAMOND is out now @natmethods.nature.com www.nature.com/articles/s41...

3 weeks ago 38 10 0 0

How much protein diversity can Life on Earth actually generate?

With DIAMOND DeepClust, we show how billions of proteins across the tree of life can be clustered at low-identity for downstream analytics tasks.

📚Paper: www.nature.com/articles/s41...
💻Code: github.com/bbuchfink/di...

4 weeks ago 64 29 1 0

Adobe Acrobat

My group at MIT is seeking a research scientist with a strong *experimental* background to lead and help shape the lab’s experimental infrastructure, supporting efforts to advance AI-driven enzyme discovery and characterization.

See the full JD here: acrobat.adobe.com/id/urn:aaid:...

4 weeks ago 16 16 1 0

evedesign: accessible biosequence design with a unified framework Unified protein design for computational researchers and experimentalists

Meet evedesign: open-source AI, accessible protein design
✅Combine models for multiobjective optimization
✅Integrate experimental data
✅ Run on your own infrastructure
📄Paper: www.biorxiv.org/content/10.6...
💻Code: github.com/evedesignbio
🌐Webserver: evedesign.bio
Collaborate: hello@evedesign.bio

4 weeks ago 27 6 0 0

The MMCA always has interesting exhibitions and the area between it and Anguk has a really nice vibe (even ignoring the major tourist hotspots of the palace and Bukchon village).

A bit more offbeat: there are a few archery cafes, if you want to try a new sport.

1 month ago 2 0 0 0

Cherry blossom season is starting very soon (in ~1 week south, ~2 in Seou). Yeuido (around the National Assembly) is pretty good for 🌸 but will also get quite busy.

Taking an walk along Cheonggyecheon stream or along the Han river is always nice (e.g,. Nodeul island, Banpo bridge).

1 month ago 2 0 1 0

AlphaFold hits ‘next level’: the AI database now includes protein pairing The database of 200 million protein-structure predictions now includes homodimers, adding new biological relevance.

@ecallaway.bsky.social wrote a news article on our AlphaFold complex work. Thank you for covering it.

📄 www.nature.com/articles/d41...

1 month ago 19 6 0 0

AlphaFold database has entered the era of complexes. Together with NVIDIA, DeepMind and EBI, we use ColabFold, OpenFold and MMseqs2-GPU to predict ~31 million complexes (homo & hetro-dimers) resulting in 1.8 million high-quality predictions
📄 research.nvidia.com/labs/dbr/ass...
🌐 alphafold.ebi.ac.uk

1 month ago 265 111 8 3

You asked, we listened. Millions of AI-predicted protein complex structures are now available in the #AlphaFold Database.

This spans homodimers from 20 of the most studied species, including humans, as well as the World Health Organization’s priority pathogens list.

www.ebi.ac.uk/about/news/t...

1 month ago 157 86 7 4

Efficient protein structure prediction fromcompact computers to datacenters withOpenFold-TRT www.biorxiv.org/content/10.64898/2026.03...

1 month ago 12 7 0 0

ProteinTTT is now easy to run on Hugging Face Spaces and Google Colab. We’ll also be presenting the paper at ICLR 2026 🇧🇷
🤗 Hugging Face Space: huggingface.co/spaces/pimen...
⚙️ Google Colab: colab.research.google.com/drive/1l_h7c...
🧵👇

1 month ago 40 9 3 0

$Two-panel calibration plot (two benchmark dimer datasets) comparing predicted interchain contact-probability bins (x-axis) with the observed fraction of native interfacial contacts (y-axis). Points follow the diagonal, indicating close agreement between predicted probabilities and true interface-contact fractions.$

Two-panel calibration plot (two benchmark dimer datasets) comparing predicted interchain contact-probability bins (x-axis) with the observed fraction of native interfacial contacts (y-axis). Points follow the diagonal, indicating close agreement between predicted probabilities and true interface-contact fractions.

My first manuscript in MPI colours! With @tothpetroczylab.bsky.social, we show that AlphaFold PAE-derived contact probabilities are well calibrated to the fraction of true interface contacts across experimentally determined protein dimers.

www.biorxiv.org/content/10.6...

1 month ago 24 10 1 2

Release SeqKit v2.13.0 (10-year-old birthday version) · shenwei356/seqkit Changelog SeqKit is 10 years old! SeqKit v2.13.0 - 2026-02-28 seqkit: add support for reading and writing LZ4 compression format. new command: seqkit sample2: improved seqkit sample by @stahiga....

Can't wait to release a 10-year-old birthday version for SeqKit!

- 10 years
- 2 papers, 3500 citations
- 20 contributors
- 40 subcommands
- 880 commits
- 500 issues
- 685.5K Bioconda total downloads

Thank you all, dear contributors and users!
I'll keep maintaining it.

github.com/shenwei356/s...

1 month ago 125 35 6 1

At the 132nd Internat. Titisee Conference on Biology 2.0: The AI Revolution in Biology & Medicine

From sequence→function models 🧬
to protein & generative structure models 🧪
to AI of cell states & perturbations 🧫

Great science, great friends, beautiful lake. Thanks @BIFonds!

1 month ago 16 2 0 0

New version of our preprint on bioRxiv about bioRxiv up. Now that’s what I call a revision – 6 years after the first version!
It has new data about our progress and highlights from a massive user survey. 1/n
www.biorxiv.org/content/10.1...

1 month ago 78 44 1 4

Can we simulate realistic evolutionary trajectories and “replay the tape of life”? In this work, we propose a flexible, generalizable deep learning framework for modeling how the entire protein sequence evolves over time while capturing complex interactions across sites. 1/n
doi.org/10.64898/202...

2 months ago 83 35 3 1

Annotating genomes at increased scale and resolution Nature Reviews Genetics - In this Review, Ji et al. overview how rapidly advancing experimental and computational methods are enabling improved and automated annotation of gene structure and...

Our new review on genome annotation just appeared in @naturerevgenet.bsky.social, with a particular focus on the human genome, with Hayden Ji and Mihaela Pertea: rdcu.be/e4mI1

2 months ago 24 12 0 0

Introducing The Structural History of Eukarya (SHE): The first proteome-scale phylogeny constructed entirely from 3D structure.
We computed 300 trillion alignments across 1,542 species to map the tree of life. 🧵👇 (1/5)

2 months ago 85 40 2 0

Compbio Asia

Please spread the word:

We invite applications to a two-week Computational Biology workshop in Singapore, June 14-27.

This NSF-funded workshop brings together 16-20 US grad students with international peers.
Apply by March 21: compbioasia.net
🧵 Details below:

2 months ago 3 9 2 2

Distance-Restraint-Guided Diffusion Models for Sampling Protein Conformational Changes and Ligand Dissociation Pathways
Tatsuki Hori, Yoshitaka Moriwaki, Ryuichiro Ishitani
www.biorxiv.org/content/10.6...
Our new preprint is out.

2 months ago 6 2 0 0

Multiple protein structure alignment at scale with FoldMason Protein structure is conserved beyond sequence, making multiple structural alignment (MSTA) essential for analyzing distantly related proteins. Computational prediction methods have vastly extended ou...

FoldMason is out now in @science.org. It generates accurate multiple structure alignments for thousands of protein structures in seconds. Great work by Cameron L. M. Gilchrist and @milot.bsky.social.
📄 www.science.org/doi/10.1126/...
🌐 search.foldseek.com/foldmason
💾 github.com/steineggerla...

2 months ago 301 147 4 3

AmpliPhy improves gene trees by adding homologs without affecting alignments In phylogenomics, gene tree reconstruction depends on multiple sequence alignment (MSA) and tree inference, and ongoing work continues to improve inference quality. Denser taxon sampling has been associated with improved gene tree inference, suggesting that adding homologs could be a practical route to higher accuracy as sequence databases continue to expand. However, adding sequences can influence multiple steps of typical inference pipelines, and little is known on its specific effect on the multiple sequence alignment, tree reconstruction, and rooting steps. We performed a large-scale empirical benchmark to quantify how homolog enrichment affects alignment and phylogenetic inference. Using an enrichment-impoverishment design and a measure of tree accuracy based on taxonomic congruence, we found that enrichment consistently improves tree inference quality, while effects on alignment quality are marginal. We show that this improvement is associated with accurate root placement on enriched trees when sensitive homolog search is accompanied. Notably, much of the benefit can be retained with relatively compact alignments produced by sequence addition. Building on these observations, we provide a tool, AmpliPhy, which efficiently improves phylogenetic reconstruction of protein families through homolog enrichment. The AmpliPhy open-source pipeline software is available at https://github.com/DessimozLab/ampliphy. ### Competing Interest Statement The authors have declared no competing interest. Swiss National Science Foundation, https://ror.org/00yjd3n13, 216623, 10005715

Can ever-increasing sequence databases improve phylogenetic reconstruction of a gene family? Our new preprint introduces AmpliPhy, a pipeline that automates homolog enrichment to improve gene tree inference, built on a robust phylogenomic benchmark scheme. 🧵1/n
📃 doi.org/10.64898/2026.01.26.701724

2 months ago 26 15 1 0

Milot’s venture into establishing his own lab is incredibly excitinge. I highly recommend to join Milot on his mission to advance molecular biology through open-source bioinformatics.

3 months ago 36 3 0 0

Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning Mirdita Lab builds scalable bioinformatics methods.

My time in @martinsteinegger.bsky.social's group is ending, but I’m staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org

3 months ago 104 55 7 1

Posts by Milot Mirdita