Love it, thanks Phil! And huge props on both RustQC and rewrites.bio!
Posts by Nick Minor
Worth pointing out another practice this project follows that’s MUCH more accessible with AI:
When you announce your new tool, why not ship it with a little Astro site for a landing page and docs?
Astro is great. And agents are great at it.
I definitely think it’s a “yes-and” in RustQC’s case, @ewels.bsky.social! What you needed was definitely a CLI. And also! A library crate/library crates focused on providing nice projections of sequencing data for QC purposes could be valuable in its own right too.
A highly underrated project to that end is to do the rewrite, not to extract an equivalent command line app, but rather to extract a library with the right abstractions for people to build equivalent and better apps themselves.
I think we really undervalue libraries in bioinformatics, to our peril.
Honestly Phil as soon as I posted that nit I was like, “yeah but what does he replace it with?” That’s a tough one.
The only ideas I have are “samtools stats suite”, “samtools stats commands”, or “samtools flagstat/idxstats/stats”. None are great 🤔
Totally agree something has shifted. We used Claude Code extensively for fgumi and it genuinely changed our velocity on the port from Scala to Rust. The key was having deep domain expertise to guide it and catch the subtle stuff. AI as a force multiplier for experienced devs is very real now.
An AI rewrite AND a manifesto about AI rewrites in the same announcement is about as 2026 as it gets
This looks amazing Phil! Let the Rust rewrites keep coming😎
One tiny nit: I had a big double take at the headline about what it reimplements including samtools (which would be massive)! If it’s not a full reimplementation of samtools, I might reword that to say which subcommand it reimplements.
Didn’t mean to imply that you hadn’t!
I was more wondering whether the 3-year-maintenance contract would make trying a new tool an easier sell for labs maintaining projects like, say, mmseqs2. But perhaps I’m overestimating how much maintenance work these projects actually need!
Having a discussion about maintenance at all (let alone for a set amount of time) is a real selling point!
I could see a knock-on being that it’s actually easier to recommend students make their own tool instead of working on the lab’s core offering just because that’s what the lab maintains.
Yes that is fair! And maybe also Rust people just aren’t used to being on the other side of the “rewrite-it-in-X” table!
In that context, I think rewriting and not upstreaming absolutely is the sensible default. I guess I just hope the Rust bioinformatics ecosystem in particular is moving away from that and toward more organization.
Academia is an interesting peculiar case here. Academic open source has long had different norms because we're incentivized to put out open source prototypes for publication, but almost disincentivized to maintain them. Many academic repos are thus never touched after publication.
That said like you I'm of two minds here @curiouscoding.nl. While I do think contributing is the right default and should be praised, open source should equally be whatever people want it to be. Hard to be to confidently normative in either direction.
100%. The good-faith, prosocial thing to do is to help out with another project that's already growing rather than dilute the ecosystem with something dubiously equivalent.
The annoying task that is providing C bindings yourself is kind of a bastion against the slop fork in this narrow case.
I've been thinking a lot about providing cbindgen bindings for my rust libraries like @curiouscoding.nl did for sassy. Although somehow I feel like even that won't stop C++ (or eventually Zig) slop forks...
You and me both friend github.com/nrminor/rosa...
I’ll add that Rosalind is a fabulous gateway into learning any language.
Also built-in features like exporting as static html apps, inline docs, inline PEP 723 dependencies with uv, format-on-save with Ruff, SQL queries with duckdb, ui elements like a data frame explorer that support pandas and polars, etc. are all killer for data science.
Extremely good.
I’m skeptical that Jupyter notebooks are as un-reproducible as some of the claims out there, but if reproducibility is one of your priorities, I don’t see why you wouldn’t default to Marimo.
Wait is that literally the entirety of GenBank from 1994?