Advertisement · 728 × 90

Posts by Wei Shen 沈 伟

GTDB - Genome Taxonomy Database The Genome Taxonomy Database (GTDB) is an initiative to establish a standardised microbial taxonomy based on genome phylogeny.

GTDB release 11 based on RefSeq 232 (R11-RS232) is live at gtdb.ecogenomic.org. This release covers 901,341 genomes (23% increase) and has 199,923 species clusters (39% increase). Release notes at: forum.gtdb.ecogenomic.org/t/announcing.... Release statistics at: gtdb.ecogenomic.org/stats/r232.

5 days ago 49 29 0 5
Video

Modern DRAM is based on a brilliant design from IBM.

But, we're still paying for a latency penalty that's existed since the 60s!

In this video, I'm introducing my research project (Tailslayer) that immensely reduces p99.99 latency on traditional RAM!

1 week ago 185 40 3 7
Preview
Release LexicMap v0.9.0 · shenwei356/LexicMap v0.9.0 - 2026-03-13 New commands: lexicmap utils 2sam: Convert the default search output to SAM format (#26). Attention: This command requires search results generated by the current LexicMap ver...

LexicMap v0.9.0 has been released with

- a few bug fixes: CIGAR, bitscore and evalue calculation.
- new features: better support for big genomes like human.
- a new command to convert output to SAM format

github.com/shenwei356/L...

1 month ago 8 1 0 0

Thank you! Its original goal was to help beginners do some simple but common tasks. Before that, I shared Perl and Python scripts to help students, but they struggled to install the Perl modules. At that time, Go 1.0 was released, with a great feature of easily providing executable binaries.

1 month ago 4 0 0 0

Thank you, James! Glad you like it.

1 month ago 2 0 0 0
Preview
Release SeqKit v2.13.0 (10-year-old birthday version) · shenwei356/seqkit Changelog SeqKit is 10 years old! SeqKit v2.13.0 - 2026-02-28 seqkit: add support for reading and writing LZ4 compression format. new command: seqkit sample2: improved seqkit sample by @stahiga....

Can't wait to release a 10-year-old birthday version for SeqKit!

- 10 years
- 2 papers, 3500 citations
- 20 contributors
- 40 subcommands
- 880 commits
- 500 issues
- 685.5K Bioconda total downloads

Thank you all, dear contributors and users!
I'll keep maintaining it.

github.com/shenwei356/s...

1 month ago 125 35 6 1

I should have said, this takes us to about 2.8 million genomes in total. We don't have annotations, etc for the latest data yet, this will be an ongoing process

1 month ago 14 7 1 0
Advertisement
Preview
Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny - Nature Methods This Resource paper presents a global SARS-CoV-2 phylogenetic tree of 4,471,579 high-quality genomes consistently constructed by Viridian, an efficient amplicon-aware assembler.

A long time ago in a galaxy far away, there was a SARS-CoV-2 pandemic. Our paper, led by @martibartfast.bsky.social
a) correcting errors in 4.5 million genomes & their phylogeny
b) improving representation of the Global South in public data
www.nature.com/articles/s41...
(thread 1/n)

2 months ago 137 66 3 6

> We will change several names that have long been in use (e.g., Firmicutes, Proteobacteria) to newly formalized names (e.g., Bacillota, Pseudomonadota) that may be unfamiliar to some.

ncbiinsights.ncbi.nlm.nih.gov/2022/11/14/p...

2 months ago 1 0 1 0

Phold's manuscript is now available @narjournal.bsky.social thanks to @susiegriggo.bsky.social @npbhavya.bsky.social @vijinim.bsky.social @linsalrob.bsky.social @martinsteinegger.bsky.social @milot.bsky.social @eunbelivable.bsky.social & others not on bsky #phagesky academic.oup.com/nar/article/...

3 months ago 84 44 1 1

For those of us interested in software development, data structure design etc in science, this is a must-read. A taste of what is happening in communities letting AI agents go wild writing code, creating PRs, writing documentation : spoiler - humans get addicted, lose perspective, slop everywhere.

3 months ago 21 6 0 0
Preview
Phage therapy of perinephric abscess in kidney transplantation recipients caused by drug‐resistant Pseudomonas aeruginosa Click on the article title to read more.

Phage therapy of perinephric abscess in kidney transplantation recipients caused by drug‐resistant Pseudomonas aeruginosa - Liu - 2025 - mLife - Wiley Online Library onlinelibrary.wiley.com/doi/10.1002/...

3 months ago 3 1 0 0
Preview
Scikit-bio: a fundamental Python library for biological omic data analysis - Nature Methods Nature Methods - Scikit-bio: a fundamental Python library for biological omic data analysis

The scikit-bio paper in online in Nature Methods! Many thanks to our collaborators, community contributors and reviewers! We couldn’t have done it without you. www.nature.com/articles/s41... #Bioinformatics #OpenSource

4 months ago 97 51 3 0

So did I, but I'm serious this time.

4 months ago 2 0 0 0
Advertisement

Yes, finally! I bought the first Rust Book in 2019 ...

4 months ago 2 0 1 0
Performance on reading and writing plain FASTA/Q files

Performance on reading and writing plain FASTA/Q files

My first Rust toy tool (just for learning purposes)!!!
The performance (both time and memory) is good!

Rust is hard to learn, and I still need more practice!
github.com/shenwei356/f...

4 months ago 27 2 2 0

Most exciting study have seen for ages, and Fernando the most excited speaker. Much anticipated. Highly recommended, a lot of food for thought (and quite a dense paper - lots to think about)

5 months ago 18 5 1 0
MVIF 44

MVIF 44

It's Monday!
...and a new #MVIF program is out! 🤩

Free registration: cassyni.com/s/mvif-44

⭐️ Highlights:
🇺🇸 Vanessa Hale
🇰🇷 Jun Hyung Cha

⭐️ Keynote:
🇺🇸 Katherine Lemon @kathlemon.bsky.social

⭐️ Talks:
🇺🇸 Meenakshi Chakraborty
🇨🇳 Wei Shen @shenwei356.bsky.social
🇺🇸 Johanna Gutleben

5 months ago 4 4 0 4
Phage Foundry

📣 New preprint from us at phagefoundry.org 📣
A solid machine learning framework & to predict strain-level phage-host interactions across diverse bacterial genera from genome sequences alone. Avery Noonan from the Arkin Lab led this massive effort
www.biorxiv.org/content/10.1...

5 months ago 27 17 1 0

Honoured and quite blown-over to receive this award. I have been, and continue to be, very lucky - first with great mentors, and then really prodigious students, postdocs and collaborators. Working with them has been a joy.

5 months ago 198 19 42 2
Preview
Genome size estimation from long read overlaps AbstractMotivation. Accurate genome size estimation is an important component of genomic analyses such as assembly and coverage calculation, though existin

Our method for genome size estimation from long-read overlaps is now published 🥳
academic.oup.com/bioinformati...

5 months ago 37 16 1 1
Advertisement

Thread on #GI2025 's second day! 👇🏻

5 months ago 11 5 0 0
Post image

Ben Langmead @benlangmead.bsky.social delivers the official opening for this year's Genome Informatics Conference #GI2025 at Cold Spring Harbor Laboratory.
List of talks and posters: meetings.cshl.edu/abstracts.as...

5 months ago 34 7 1 0

Cool paper new paper from Lorién López-Villellas, @santiagomarco.bsky.social and others!

Super cute and simple idea:
In Gotoh's affine-cost alignment, only the M matrix is needed during tracing: we can just search for a gap-length x such that M[i][j] = M[i-x][j]+o+x*e or M[i][j] = M[i][j-x]+o+x*e.

5 months ago 8 2 1 0

I also have serious concerns about the consolidation of roles (one person is now publisher, chief editor, and also a frequent author) as exemplified in a recent paper that was fast-tracked for publication.

5 months ago 6 1 1 0

Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.

5 months ago 20 15 2 0

Around 10% of your Nanopore reads (SQK-RBK114) are incorrectly trimmed. Here is why, and how our new tool Barbell solves it:

www.biorxiv.org/content/10.1...

Want to get started? github.com/rickbeeloo/b...

5 months ago 51 31 3 4
Preview
GitHub - mohsenzakeri/Movi: Fast, Cache-Efficient, and Scalable Queries on Pangenomes Fast, Cache-Efficient, and Scalable Queries on Pangenomes - mohsenzakeri/Movi

1/6 Movi 2 is here: faster and more space-efficient for pangenome queries. Its fastest mode uses half the memory of Movi 1 while running ~30% faster. github.com/mohsenzakeri...

5 months ago 44 24 1 2
How the Vectors of Antibiotic Resistance Have Evolved - Professor Zamin Iqbal
How the Vectors of Antibiotic Resistance Have Evolved - Professor Zamin Iqbal YouTube video by Milner Centre for Evolution

Podcast with me and @turiking.bsky.social for the @milnerevolution.bsky.social series, on plasmid evolution over the last 100 years, talking about our ( @cazares-adr.bsky.social , Nick Thomson, @sarah1alexander.bsky.social & co) recent paper www.science.org/doi/10.1126/...
youtu.be/Mzr3TD4ijs0?...

6 months ago 44 16 1 1
Advertisement

Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N

7 months ago 114 80 5 5