Friday was my last day at NHGRI. After 10 wonderful years, my lab is headed to Johns Hopkins University
genomeinformatics.github.io/movingday/
Posts by Jim Shaw
GTDB release 11 based on RefSeq 232 (R11-RS232) is live at gtdb.ecogenomic.org. This release covers 901,341 genomes (23% increase) and has 199,923 species clusters (39% increase). Release notes at: forum.gtdb.ecogenomic.org/t/announcing.... Release statistics at: gtdb.ecogenomic.org/stats/r232.
10 years after the first FAMSA paper, its successor is now published in Nat Biotech! We believe that FAMSA2 can enable analyses of large protein collections that were previously unattainable. Thank you, Andrzej and Cedric, for great collaboration
www.nature.com/articles/s41...
This simulation-cum-benchmark study on MAG making by @tkorem.bsky.social & team looks really interesting. Loads of plots and results to work through!
www.biorxiv.org/content/10.6...
Metabuli & Metabuli App v1.2 improve novel species classification with higher precision and recall. New light mode is 1.8Γ faster and requires 50% less storage while keeping precision. New RefSeq, GTDB, HRGM, and HROM databases added.
πΎ github.com/steineggerla...
π doi.org/10.64898/2026.03.13.711249
With a 135 complete, circular Pelagibacter genomes, we have answered some of the most outstanding questions about what is often considered the most abundant organism on the planet, with roughly 10 million times more individuals in the ocean than stars in the universe. Check out our preprint.
Pelagibacter, resolved www.biorxiv.org/content/10.64898/2026.04...
Actinomarina, resolved www.biorxiv.org/content/10.64898/2026.04...
Whenever I presented Phold, I was frequently asked "can you do the same beyond phages?" We ( @oschwengers.bsky.social @linsalrob.bsky.social @binomicalabs.org et al) finally did it with Baktfold github.com/gbouras13/ba... www.biorxiv.org/content/10.6...
Following up on this - MADRe is now officially published π
Very grateful for the guidance of @msikic.bsky.social @rvicedomini.bsky.social and Kresimir Krizanovic
π academic.oup.com/gigascience/...
Want to annotate a bacterial genome with structures?
@oschwengers.bsky.social bakta and @gbouras13.bsky.social phold got together, and the result is Baktfold: protein annotation across the microbial tree of life using structures
www.biorxiv.org/content/10.6...
#phagesky #microsky #microbiomesky
Our work on 'hidden diversity' in unbinned contigs is now published in @natmicrobiol.nature.com :
www.nature.com/articles/s41...
See the linked threads for more details!
Cenote-Taker 3's manuscript has now been reviewed and published in @peercomjournal.bsky.social ! π
I love the mission and ethics of PCJ, focusing on rigorous peer review, open access, and author involvement in publishing. I highly recommend it to others!
π§¬π₯οΈ
peercommunityjournal.org/item/10.2407...
𧬠pathotypr is now on Bioconda! @pathogenomics.bsky.social
conda install -c bioconda pathotypr
Alignment-free lineage classification & drug resistance genotyping from WGS. Works with any pathogen bring your own SNP markers.
β‘ Rust, ~1s/sample
π₯οΈ CLI + GUI
π github.com/PathoGenOmics-Lab/pathotypr
With myloasm (amazing work by @jimshaw.bsky.social @mgmarin.bsky.social @lh3lh3.bsky.social ), deeper sequencing, and VAE clustering, could we actually decompose a metagenome? I wouldn't say we completely decomposed the metagenome, but I think we've gotten further than nearly anyone else I know.
What happens when you use a strain-resolved assembler on 720 Gbp of Nanopore metagenome sequence? Simultaneous strain-level resolution of multiple co-occurring lineages from hundreds of single contig high-quality genomes, including 78 Pelagibacter.
@tnn1.bsky.social and myself:
Studying DNA modification in microbes shouldn't be limited by methodology. π§¬
Weβre excited to introduce MODIFI, our new scalable method for detecting DNA modifications in PacBio metagenomic data and estimating ECE-host linkage. Check out the preprint: www.biorxiv.org/content/10.6...
LongcallR for competitive SNP calling and haplotype phasing, and simplified allele-specific analysis with long RNA-seq reads. Found ~100 junctions affected by SNPs per sample with most junctions novel.
Developed by Neng Huang. Published in @natmethods.nature.com. Read at rdcu.be/faKhL
A run-length-compressed skiplist data structure for dynamic GBWTs supports time and space efficient pangenome operations over syncmers www.biorxiv.org/content/10.64898/2026.03...
Thank you Gaetan!
Thanks Steven!!
Myloasm, our long-read metagenome assembler, is now published! w/ @mgmarin.bsky.social and @lh3lh3.bsky.social
Very rewarding after > a year of development and countless hours thinking about assembly. Thanks to beta testers, Li lab, and reviewers who gave very helpful feedback.
rdcu.be/famFj
Thanks Margot!! π
_720 Gbp_ marine nanopore metagenome -> 328 circular prokaryotic contigs: using myloasm!
Insane work by Lui and Nielsen. Also shows how modern long read assemblies can disentangle coexisting strains and reveal ecological insights.
_720 Gbp_ marine nanopore metagenome -> 328 circular prokaryotic contigs: using myloasm!
Insane work by Lui and Nielsen. Also shows how modern long read assemblies can disentangle coexisting strains and reveal ecological insights.
Long reads carry multiple small vars and SVs and their phasing. LongcallD is the only caller that tightly integrates germline/mosaic small/structural vars/MEIs and their phasing in a single C program. One command line to get competitive small variant calls and better SVs. Led by Yan Gao.
Recently amplified gene arrays are a super interesting phenomenon, but many still resist our attempts to assemble them. @dantipov.bsky.social has developed a new method (Trivial Tangle Traverser) that resolves assembly graph tangles caused by such sequences (1/4) www.biorxiv.org/content/10.6...
I just had that conversation earlier this week. The college dean is deciding if they will (dis)continue the bioinformatics training program for grad students. An argument for discontinuation is that students will use GenAI to help them code, so they donβt need to learn bioinformatics.