Advertisement Β· 728 Γ— 90

Posts by Phil Ewels

Did you miss the registration deadline for the Boston #NextflowSummit? Fear not - all talks will be live streamed online! πŸŽ‰

The hybrid event will use the same platform as the October 2025 event, you can watch live and ask questions for the speaker. Hit the link below to register!

1 day ago 0 0 0 0

Heng Li's blog posts are always thoughtful and well written, this is no exception.

Great to see folks coming to similar conclusions around #Rust #rewrites in bioinformatics.

Also appreciate the RustQC / rewrites.bio cite πŸ™πŸ»

3 days ago 1 0 0 0

Your Bluesky bio says Blology, I quite like that πŸ˜… Like, the computational analysis of glass blowing or something..

5 days ago 0 0 0 0

Coming next month: besteditor.bio - a manifesto on how to configure vim key bindings for optimal bioinformatics analysis..

1 week ago 6 0 1 0

I don't think I advocated ignoring any licenses anywhere? Rather the opposite: "Check licenses before you start", meaning check that they are compatible. I can look into rewriting this section if my intentional meaning did not come across well.

1 week ago 0 0 1 0
Preview
GitHub - seqeralabs/RustQC: Fast genomics quality control tools for sequencing data, written in Rust. Fast genomics quality control tools for sequencing data, written in Rust. - seqeralabs/RustQC

The rewrite is not the nf-core pipeline, it's a separate project. The pipeline just calls a binary. The pipeline code is and always was MIT. The rewritten tool code I released as GPL-3, matching the strictest (and mostly consensus) license of the upstream software tools.

github.com/seqeralabs/R...

1 week ago 0 0 1 0

haha, happy to be of service 🀣

1 week ago 2 0 1 0

Yes I've found the same - and tests are crucial in this context. I was aiming for an @nf-co.re / #rnaseq pipeline drop-in replacement so I ended up using the test suite we have there. It has output snapshots etc so is specific and easy to iterate with. Can always have more edge cases though..

1 week ago 0 0 0 0
Advertisement
Post image

The nf-core Hackathon in Boston is happening in just over 2 weeks! πŸ’»Join us April 28–29 for two days of collaborative hacking.🎟️ Register now: hubs.la/Q04brb4c0

Whether you want to add new features to existing pipelines, work on tooling, or tackle community initiatives β€” there's something for you.

1 week ago 7 5 1 1

This is one of several things that this project taught me: being strict and comprehensive on the testing, as early as possible, is absolutely key (and avoids a lot of pain later). I think this is much of what differentiates a high quality rewrite vs. poor, and is also key for trust + adoption.

1 week ago 1 0 0 0
Preview
GitHub - seqeralabs/RustQC-benchmarks: Benchmark suite for validating RustQC outputs against upstream bioinformatics tools Benchmark suite for validating RustQC outputs against upstream bioinformatics tools - seqeralabs/RustQC-benchmarks

Yeah I agree, this has always been the case but it's especially important now. I started with these 2 files and extended to a pipeline (github.com/seqeralabs/R...) and later the nf-core/rnaseq pipeline tests. There are a fair number of unit tests as well. Definitely could be improved though.

1 week ago 1 0 1 0
rewrites.bio - A manifesto for bioinformatics Principles for responsible AI-assisted rewriting of bioinformatics tools.

This conclusion is what drove me to put rewrites.bio together. A desire to get ahead of this rapid change and educate / demonstrate best practices and define how we want people to work with these new tools.

1 week ago 0 0 1 0

I've seen the same as well. This isn't a problem with LLMs though (especially as their quality improves), it's a problem with the people using them. That's what I would like to help improve. Not just cover our eyes and hope that people won't use LLMs.

1 week ago 1 0 2 0
dupRadar RustQC's dupRadar output files, including the duplication matrix, fitted model parameters, diagnostic plots, and benchmark comparisons.

Yup, documentation of the validation is essential for user trust. I tried to be cautious with my language, differences in RustQC are typically at the 14th decimal place or similar. It's all detailed on the docs pages: seqeralabs.github.io/RustQC/rna/d...

It's also encoded in the CI snapshots.

1 week ago 1 0 1 0

I'm not sure why use of LLMs indicates such an expectation. In fact I could imagine a future where the inverse is true - more people are empowered to help with maintenance. There are problems with this (slop, review burden etc). But the motivations for better results and software remain the same.

1 week ago 3 0 2 0

This approach won't work for everyone. Some will want new / different functionality which breaks the model. Some won't validate thoroughly enough. There are many risks. But there are also a lot of practical benefits if it's done well.

1 week ago 0 0 0 0
Advertisement

For RustQC (in nf-core/rnaseq) we will use continuous integration snapshot tests to keep confidence that outputs remain identical (or functionally identical at least) to the outputs created by the original tools. Just 60x faster.

1 week ago 0 0 2 0

The approach described there (and for RustQC) is to use large and precise validation against upstream tools and to keep them "hot swappable". So the upstream tools are unaffected and maintenance efforts continue there. The rewrite is for performance purposes only and tracks upstream changes.

1 week ago 0 0 1 0
rewrites.bio - A manifesto for bioinformatics Principles for responsible AI-assisted rewriting of bioinformatics tools.

This is the uncomfortable truth I'm keen to address, I would like to accelerate us towards some best practices before we fall into this trap. That was my hope for rewrites.bio - to start a discussion basically.

1 week ago 1 0 2 0

Very very important initiative, check it out #bioinformaticians!

1 week ago 10 3 0 0
Post image

AI coding assistants just passed a threshold: domain experts can now rewrite established scientific software in days.

This wave of rewrites is coming for #bioinformatics. The question isn't whether to embrace this capability, but how to do it responsibly.🧡 hubs.la/Q04b85ww0

1 week ago 3 1 1 0

This approach makes tests / benchmarks essential, and they effectively *have* to be enough. This is where trusting the upstream tool comes in. It's very easy to do huge numbers of comparisons to that, so if we understand that code, then the rewrite can inherit that trust.

1 week ago 0 0 0 0

It's of course better if you read and understand the code. My argument here is that it's no longer a requirement with LLMs. You can discuss the code with an LLM, ask questions, make changes - all without understanding the syntax. Choose language based on it's features, not just your experience.

1 week ago 0 0 1 0

In the end I decided that this was a case-by-case scenario and I should leave it as a light-touch in rewrites.bio. But I agree that it's one of, if not *the* most contentious issue about LLM rewrites. And I don't have a good answer myself as to how we should deal with it.

2 weeks ago 1 0 0 0

Also worth noting that most tool maintainers probably don't want a PR that deletes all their code and has 100k lines of new code in a language that they don't know. Contributions should be helpful, not burdensome (as far as is possible).

2 weeks ago 2 0 1 0

If multiple tools are emulated in one package, how does one contribute that back upstream? And if not by code directly, it's probably of limited use, as the goal is exact replication - not any new insight or features.

If it's a 1:1 tool rewrite then I totally agree that every effort should be made.

2 weeks ago 0 0 1 0
Advertisement

In the first draft of the site I did have a point on this, but I felt a little unsure about it so I scaled it back to just bug reports (point 5.4). It can be complex if a rewrite incorporates multiple tools (as suggested in 3.1 and done in RustQC).

2 weeks ago 0 0 1 0
Preview
Credits & Citation Credits, citations, and acknowledgments for the tools and libraries that RustQC builds upon.

I totally agree - the very first point on rewrites.bio is about academic integrity: "Credit the original authors".

I wasn't suggesting that rewrites should be published. People should cite the underlying tools, eg. seqeralabs.github.io/RustQC/about...

2 weeks ago 2 0 1 0

This concept has come up in several different comment threads. It's not really something I thought about during the project, but I think it's really interesting!

2 weeks ago 2 0 0 0
Preview
rewrites.bio: Principles for AI-assisted Modernization of Scientific Software | Seqera A wave of AI-driven tool rewrites is coming to bioinformatics. We've published a set of best-practices principles to try to help people to approach rewrites in the right way.

πŸ” Discover principles for rewriting tools with AI: seqera.io/blog/rewrite...

2 weeks ago 2 2 0 0