Turns out that the usual NtHash is not as random as one might think?!?! At least not for minimizers.
Seq-hash (and simd-minimizers) already has this fixed by default ;)
github.com/rust-seq/seq...
Posts by Igor Martayan
New blog post!
I use ntHash all the time to hash k-mers, yet it turns out it has some unexpected flaws (collision propagation, bias on leading zeros...). The good news: each of them can be fixed!
igor.martayan.org/posts/breaki...
The fastest way to match characters on ARM processors? lemire.me/blog/2026/04/19/the-fast...
I think this blog post comes closest to my current thinking on AI than any other I've read: fransskarman.com/im_not_using...
Not nearly as polished, but I'm currently writing my thesis and it overlaps many of these topics
@ksahlin.bsky.social any news about the Recomb-Seq reviews?
I'm not looking forward to a future where all the tools are being vibe-rewritten into languages people don't want to learn. Who will maintain all this? The original maintainers won't. It's not the language they were comfortable with. Does the prompter understand the tool well enough?
I could also add some postprocessing to replace @misc by @article for arxiv outputs.
The original bibtex is actually produced by doi.org itself (see tex.stackexchange.com/questions/68...) but I do some postprocessing to handle special characters and do small modifications.
I kept being rate limited by doi2bib, so I made a small CLI to replace it locally: github.com/imartayan/bi...
It can output bibtex from DOIs and arxiv/biorxiv links, redirect to the published version when it's available and copy the result to your clipboard
Modern DRAM is based on a brilliant design from IBM.
But, we're still paying for a latency penalty that's existed since the 60s!
In this video, I'm introducing my research project (Tailslayer) that immensely reduces p99.99 latency on traditional RAM!
Yeah; I have a bunch of incoherent thoughts about this...
1) maintaining bindings is annoying and purely a service. I already have the rust code and don't need bindings myself.
2) somebody needs to own them. Typically the main dev (me), even though someone else is the user and expert.
(1/10)
2. I don't mind if people don't want to use Rust but instead of rewriting it from scratch every single time, why don't you write bindings and make them available for everyone? This is both easier and more maintainable, and would actually be helpful for the community.
1. We keep improving these libraries, fixing bugs and adding new features regularly. Your code will likely be outdated in a few months.
A quick rant on people vibe-translating our Rust libraries to other languages
That's the second time in a week that I see new bioinformatics tools with a vibe-coded translation of our Rust libraries to C/C++.
I have two major issues with that:
Myloasm, our long-read metagenome assembler, is now published! w/ @mgmarin.bsky.social and @lh3lh3.bsky.social
Very rewarding after > a year of development and countless hours thinking about assembly. Thanks to beta testers, Li lab, and reviewers who gave very helpful feedback.
rdcu.be/famFj
A run-length-compressed skiplist data structure for dynamic GBWTs supports time and space efficient pangenome operations over syncmers www.biorxiv.org/content/10.64898/2026.03...
Why phylogenies compress so well: combinatorial guarantees under the Infinite Sites Model www.biorxiv.org/content/10.64898/2026.03...
Lucas Czech: Fast Iteration of Spaced k-mers https://arxiv.org/abs/2603.25417 https://arxiv.org/pdf/2603.25417 https://arxiv.org/html/2603.25417
Amazing talk, I didn't know about left-right structures!
Damn, your swap is as large as my entire RAM!
Niceee, was waiting for this one, and already used it in a few places 😆
Congrats @imartayan.bsky.social et al :)
Super Bloom: Fast and precise filter for streaming k-mer queries www.biorxiv.org/content/10.64898/2026.03...
Kamila Szewczyk, Sven Rahmann: Hecate: A Modular Genomic Compressor https://arxiv.org/abs/2603.15390 https://arxiv.org/pdf/2603.15390 https://arxiv.org/html/2603.15390
Sensitive and scalable metagenomic classification using spaced metamers, reduced alphabets, and syncmers www.biorxiv.org/content/10.64898/2026.03...
It's a good day when the first item in your feed is your own work :)
@rickbitloo.bsky.social was annoyed that scanning reads for all 96 rapid kit barcodes is bottleneck in Barbell, so he made Sassy2: 13x (150bp) to 4.6x (8kbp) faster than v1 by batch-searching patterns, and >100Gbp/s on 16 threads!
⏰ Last chance! Register your submission with a placeholder abstract for #RECOMBSeq TODAY. Final paper due March 15, 2026, 23:59 AoE
Submit now 👉 recomb-seq.github.io/seq2026/call...
Excited to share this preprint that describes my latest work on using GPUs to accelerate processing of RNA-seq data.
The title says it all: "RNA-seq analysis in seconds using GPUs" now on biorxiv www.biorxiv.org/content/10.6... and github github.com/pachterlab/k...
Figure 1 shows they key result
I got nerd-sniped into writing some code for Customizable Contraction Hierarchies, which is an insanely cool and super simple data structure for fast route planning.
Nice survey paper by Michael Zündorf et al: arxiv.org/abs/2502.10519
My unpolished implementation nodes: curiouscoding.nl/posts/cch/
Deacon can now run in the browser using WebAssembly. Sequence data never leaves your machine. It currently supports FASTA/Q filtering using indexes up to 1GB in size.
Demo: bede.im/deacon