Advertisement Β· 728 Γ— 90

Posts by Jim Shaw

We're moving! Friday was my last day at NHGRI. After 10 wonderful years, my lab is headed one hour north on I-95 to set up shop at Johns Hopkins University. This is a very bittersweet move for me, as NHGRI has prov...

Friday was my last day at NHGRI. After 10 wonderful years, my lab is headed to Johns Hopkins University
genomeinformatics.github.io/movingday/

1 day ago 132 19 15 1
The AI Rewrite Dilemma

Blog post on "The AI Rewrite Dilemma": lh3.github.io/2026/04/17/t...

4 days ago 54 29 3 4
GTDB - Genome Taxonomy Database The Genome Taxonomy Database (GTDB) is an initiative to establish a standardised microbial taxonomy based on genome phylogeny.

GTDB release 11 based on RefSeq 232 (R11-RS232) is live at gtdb.ecogenomic.org. This release covers 901,341 genomes (23% increase) and has 199,923 species clusters (39% increase). Release notes at: forum.gtdb.ecogenomic.org/t/announcing.... Release statistics at: gtdb.ecogenomic.org/stats/r232.

6 days ago 49 29 0 5
Preview
Fast and accurate multiple-protein-sequence alignment at scale with FAMSA2 - Nature Biotechnology FAMSA2 accurately aligns millions of protein sequences at high speed.

10 years after the first FAMSA paper, its successor is now published in Nat Biotech! We believe that FAMSA2 can enable analyses of large protein collections that were previously unattainable. Thank you, Andrzej and Cedric, for great collaboration
www.nature.com/articles/s41...

1 week ago 56 22 3 2

This simulation-cum-benchmark study on MAG making by @tkorem.bsky.social & team looks really interesting. Loads of plots and results to work through!

www.biorxiv.org/content/10.6...

1 week ago 19 4 1 1
Post image

Metabuli & Metabuli App v1.2 improve novel species classification with higher precision and recall. New light mode is 1.8Γ— faster and requires 50% less storage while keeping precision. New RefSeq, GTDB, HRGM, and HROM databases added.
πŸ’Ύ github.com/steineggerla...
πŸ“„ doi.org/10.64898/2026.03.13.711249

1 week ago 30 18 1 0

With a 135 complete, circular Pelagibacter genomes, we have answered some of the most outstanding questions about what is often considered the most abundant organism on the planet, with roughly 10 million times more individuals in the ocean than stars in the universe. Check out our preprint.

1 week ago 8 5 3 0

Pelagibacter, resolved www.biorxiv.org/content/10.64898/2026.04...

2 weeks ago 3 3 0 1
Advertisement

Actinomarina, resolved www.biorxiv.org/content/10.64898/2026.04...

2 weeks ago 1 1 1 0
GitHub - gbouras13/baktfold: Rapid & standardized genome annotation using protein structural information Rapid & standardized genome annotation using protein structural information - gbouras13/baktfold

Whenever I presented Phold, I was frequently asked "can you do the same beyond phages?" We ( @oschwengers.bsky.social @linsalrob.bsky.social @binomicalabs.org et al) finally did it with Baktfold github.com/gbouras13/ba... www.biorxiv.org/content/10.6...

2 weeks ago 56 22 1 2

Following up on this - MADRe is now officially published πŸŽ‰

Very grateful for the guidance of @msikic.bsky.social @rvicedomini.bsky.social and Kresimir Krizanovic

πŸ”— academic.oup.com/gigascience/...

2 weeks ago 9 6 1 1

Want to annotate a bacterial genome with structures?

@oschwengers.bsky.social bakta and @gbouras13.bsky.social phold got together, and the result is Baktfold: protein annotation across the microbial tree of life using structures

www.biorxiv.org/content/10.6...

#phagesky #microsky #microbiomesky

2 weeks ago 76 34 0 0

Our work on 'hidden diversity' in unbinned contigs is now published in @natmicrobiol.nature.com :

www.nature.com/articles/s41...

See the linked threads for more details!

2 weeks ago 67 40 3 1

Cenote-Taker 3's manuscript has now been reviewed and published in @peercomjournal.bsky.social ! πŸŽ‰
I love the mission and ethics of PCJ, focusing on rigorous peer review, open access, and author involvement in publishing. I highly recommend it to others!
🧬πŸ–₯️
peercommunityjournal.org/item/10.2407...

2 weeks ago 8 1 0 0
Package Recipe 'pathotypr' β€” Bioconda documentation

🧬 pathotypr is now on Bioconda! @pathogenomics.bsky.social
conda install -c bioconda pathotypr

Alignment-free lineage classification & drug resistance genotyping from WGS. Works with any pathogen bring your own SNP markers.
⚑ Rust, ~1s/sample
πŸ–₯️ CLI + GUI
πŸ”— github.com/PathoGenOmics-Lab/pathotypr

2 weeks ago 14 4 0 0

With myloasm (amazing work by @jimshaw.bsky.social @mgmarin.bsky.social @lh3lh3.bsky.social ), deeper sequencing, and VAE clustering, could we actually decompose a metagenome? I wouldn't say we completely decomposed the metagenome, but I think we've gotten further than nearly anyone else I know.

3 weeks ago 2 1 1 0

What happens when you use a strain-resolved assembler on 720 Gbp of Nanopore metagenome sequence? Simultaneous strain-level resolution of multiple co-occurring lineages from hundreds of single contig high-quality genomes, including 78 Pelagibacter.

@tnn1.bsky.social and myself:

3 weeks ago 16 4 2 0

Studying DNA modification in microbes shouldn't be limited by methodology. 🧬
We’re excited to introduce MODIFI, our new scalable method for detecting DNA modifications in PacBio metagenomic data and estimating ECE-host linkage. Check out the preprint: www.biorxiv.org/content/10.6...

3 weeks ago 18 13 4 1
Advertisement
Preview
SNP calling, haplotype phasing and allele-specific analysis with long RNA-seq reads Nature Methods - In this study, long-read RNA sequencing achieves accurate single-nucleotide polymorphism calling, haplotype phasing and allele-specific expression analysis.

LongcallR for competitive SNP calling and haplotype phasing, and simplified allele-specific analysis with long RNA-seq reads. Found ~100 junctions affected by SNPs per sample with most junctions novel.

Developed by Neng Huang. Published in @natmethods.nature.com. Read at rdcu.be/faKhL

3 weeks ago 43 18 0 0

A run-length-compressed skiplist data structure for dynamic GBWTs supports time and space efficient pangenome operations over syncmers www.biorxiv.org/content/10.64898/2026.03...

3 weeks ago 6 3 0 0

Thank you Gaetan!

3 weeks ago 0 0 0 0

Thanks Steven!!

3 weeks ago 1 0 1 0

Myloasm, our long-read metagenome assembler, is now published! w/ @mgmarin.bsky.social and @lh3lh3.bsky.social

Very rewarding after > a year of development and countless hours thinking about assembly. Thanks to beta testers, Li lab, and reviewers who gave very helpful feedback.

rdcu.be/famFj

3 weeks ago 98 56 4 1

Thanks Margot!! 😊

3 weeks ago 0 0 0 0
Preview
Clustering the protein universe of life using DIAMOND DeepClust - Nature Methods DIAMOND DeepClust provides an ultra-fast clustering method for organizing the protein universe of life at low sequence identity, enabling large-scale dimensionality reduction and improving downstream ...

Clustering proteins using DIAMOND is out now @natmethods.nature.com www.nature.com/articles/s41...

3 weeks ago 38 10 0 0

_720 Gbp_ marine nanopore metagenome -> 328 circular prokaryotic contigs: using myloasm!

Insane work by Lui and Nielsen. Also shows how modern long read assemblies can disentangle coexisting strains and reveal ecological insights.

4 weeks ago 47 13 2 0
Advertisement

_720 Gbp_ marine nanopore metagenome -> 328 circular prokaryotic contigs: using myloasm!

Insane work by Lui and Nielsen. Also shows how modern long read assemblies can disentangle coexisting strains and reveal ecological insights.

4 weeks ago 47 13 2 0

Long reads carry multiple small vars and SVs and their phasing. LongcallD is the only caller that tightly integrates germline/mosaic small/structural vars/MEIs and their phasing in a single C program. One command line to get competitive small variant calls and better SVs. Led by Yan Gao.

4 weeks ago 45 21 0 1

Recently amplified gene arrays are a super interesting phenomenon, but many still resist our attempts to assemble them. @dantipov.bsky.social has developed a new method (Trivial Tangle Traverser) that resolves assembly graph tangles caused by such sequences (1/4) www.biorxiv.org/content/10.6...

4 weeks ago 27 11 1 0

I just had that conversation earlier this week. The college dean is deciding if they will (dis)continue the bioinformatics training program for grad students. An argument for discontinuation is that students will use GenAI to help them code, so they don’t need to learn bioinformatics.

1 month ago 163 66 18 5