It’s that wonderful time of year again. A new GTDB release is out :)
Posts by Antonio Camargo
Congratulations, @sdeorowicz.bsky.social!
10 years after the first FAMSA paper, its successor is now published in Nat Biotech! We believe that FAMSA2 can enable analyses of large protein collections that were previously unattainable. Thank you, Andrzej and Cedric, for great collaboration
www.nature.com/articles/s41...
It doesn't really test the codes explicitly. It tests several gene models (some of which use alternative genetic codes) and picks the best one. If you are looking for new genetic codes, this won't work
It was a while ago, so I don't really remember the reasons. It could have been that I wanted something very specific... I do recall that I could not find a clear reference. Sorry for the bad feedback :/
Again, I'm not super skilled in Rust, so maybe it was something that you would consider trivial.
I tried to integrate it into a package a while ago, but my impression is that you didn't design it to be used as a library (could just be me being terrible in Rust), so I ended up dropping the idea
For conjugation. I recommend CONJscan.
For replication initiators, it's more complicated. You can try MOB-Typer (but it is a bit tough to get working) or annotating finding the replication initiator with Pfams and then looking into the closest homologs. I'm not aware of an easy to run pipeline :/
We've been looking into ways to build a more encompassing classification system. As @titus.idyll.org mentioned, there's so much HGT that the assumptions that we use for bacterial taxonomy don't make sense anymore. But there might be a way to find a meaningful pattern within the HGT web :)
I'd say that the best thing you can do at the moment is characterize environmental plasmids according to their conjugation system and replication initiator family.
That's a good question. Right now, I believe COPLA is the best "taxonomy-like" thing that we have. But the software is tough to run and the database is biased towards plasmids from cultivated bacteria (which is unavoidable)
🦠🧪🧬🚨 New paper and database alert: the new IMG/VR release is now MetaVR ! We have a new website - meta-virome.org - with quick search capabilities for the >24M viruses, >12M vOTUs, and >42M protein clusters (including >790k with predicted structures !). academic.oup.com/nar/advance-...
This looks really good! Congratulations!
We're very happy to release our new database Metalog metalog.embl.de ! It offers manually curated and harmonised contextual data for 110k metagenomics samples across the globe, incl. precomputed taxonomic profiles, for interactive browsing and for download 🧵 1/7
#microsky
A stylized infographic showing the workflow for building a global soil plasmidome resource on the left and a textured world map on the right. The workflow depicts three input data streams from metagenomic datasets and isolate plasmids, which pass through steps like quality control, clustering, functional annotation, CRISPR analysis, host assignment, and detection of gene categories such as biosynthetic clusters, antimicrobial resistance, antimicrobial peptides, and CAZymes. All outputs feed into a central SQL database. The world map shows sample locations across the globe as teal circles of varying size, highlighting regions from many plasmids were recovered. Adapted from Fig 1A and 1B in Fiamenghi et al., doi:10.1038/s41467-025-65102-6
Soils contain an amazing diversity of functions encoded in plasmids.
The Global Soil Plasmidome Resource: 98,728 soil plasmids from 6,860 samples.
Led by @mattlabguy.bsky.social and @apcamargo.bsky.social at @jgi.doe.gov @biosci.lbl.gov @berkeleylab.lbl.gov
www.nature.com/articles/s41...
That's nice to hear :) Feel free to provide any feedback
Thanks, @acritschristoph.bsky.social!
@yishay.bsky.social @jimshaw.bsky.social @simrouxvirus.bsky.social @jgi.doe.gov @berkeleylab.lbl.gov
This project came together thanks to many amazing people (see tweet below for handles). A special thanks to Stephen Nayfach, who kicked off UHGV and helped guide it all the way through.
To facilitate adoption by the community, we provide online tools to allow users to explore UHGV in the browser. If you don't mind using the command line, we also provide all of the data for download :)
🌐 uhgv.jgi.doe.gov (8/8)
Taking advantage of the genomic diversity in UHGV, we used comparative genomics to examine in detail diversity-generating retroelements, methyltransferases, and endolysins, proposing mechanisms by which these functions enhance a phage’s capacity to infect new hosts. (7/8)
We then examined the genetic factors underlying broader host range and found that functions involved in phage-host interactions across multiple stages of the infection cycle shape a phage's ability to switch hosts. (6/8)
Using UHGV, we profiled thousands of human gut metagenomes and identified a subset of hyperprevalent phages found around the globe. Leveraging host prediction data, we found that these phages have markedly higher host ranges. (5/8)
Another key challenge in virome research is that most viruses lack taxonomic classification, leading to ad hoc approaches that hinder cross-study comparisons. To address this, we developed a taxonomy-like framework and a tool for assigning user's genomes to UHGV clusters. (4/8)
Viral proteins are difficult to annotate, making it hard to infer their biology from genomes. We developed a pipeline that integrates sequence- and structure-based methods to improve functional annotation and reveal novel protein domains. (3/8)
The Unified Human Gastrointestinal Virome (UHGV) includes 873,994 viral genomes recovered from the microbiomes of globally diverse populations. Its scale and genome quality make UHGV a valuable reference for future studies of human gut viromes worldwide. (2/8)
🚨New preprint out!
We present a foundational genomic resource of human gut microbiome viruses. It delivers high-quality, deeply curated data spanning taxonomy, predicted hosts, structures, and functions, providing a reference for gut virome research. (1/8)
www.biorxiv.org/content/10.1...
The old internet was a better place
We're thrilled to announce SeqHub, an AI-enabled platform for biological sequence analysis. SeqHub brings together sequence search, genome annotation, and data sharing in one place.
Our @narjournal.bsky.social manuscript is out! It explores the growth of the GTDB (gtdb.ecogenomic.org) since its inception, as well as updates to the website, methodology, policies, and major taxonomic and nomenclatural changes over the past three years.
academic.oup.com/nar/advance-...