Agent-Guided De Novo Design of Nanobody Binders Against a Novel Cancer Target www.biorxiv.org/content/10.64898/2026.04...
Posts by Daniel Anderson
Ecology of metagenomes: incorporating genotype-to-phenotype maps into ecological models www.biorxiv.org/content/10.6... #jcampubs
TFBindFormer:A Cross-Attention Transformer for Transcription Factor--DNA Binding Prediction www.biorxiv.org/content/10.64898/2026.04...
gbdraw: a genome diagram generator for microbes and organelles www.biorxiv.org/content/10.64898/2026.04...
Happy to have made a small contribution for @gbouras13.bsky.social's study. Hope this is just one of many examples in independent researcher collaboration with academic scientists!
#archaeasky folks, this pipeline rules for global baseline annotation for archaea
doi.org/10.64898/202...
💻🧬🧫🦠
Helicase: Vectorized parsing and bitpacking of genomic sequences www.biorxiv.org/content/10.64898/2026.03...
10-minimizers: a promising class of constant-space minimizers www.biorxiv.org/content/10.64898/2026.03...
As a bioinformatician, it doesn’t get more exciting than this!
Scaling to 1T+ genes isn’t just collecting more data. It demands new algorithms, infrastructure, and abstractions for complex biology.
Excited to be working with an incredible team to push the boundaries of sequence informatics.
Sensitive and scalable metagenomic classification using spaced metamers, reduced alphabets, and syncmers www.biorxiv.org/content/10.64898/2026.03...
Our new preprint on quantifying microbial sample diversity/complexity in a way that accounts for both metagenome architecture and taxonomic composition is now live on bioRxiv: www.biorxiv.org/content/10.6...
#metagenomics #bioinformatics #dataanalysis #graphdata
Splicer: Phylogenetic Placement in Sub-Linear Time www.biorxiv.org/content/10.64898/2026.02...
Huge congratulations @martibartfast.bsky.social and @zaminiqbal.bsky.social on the publication of this fantastic and massive paper. A huge achievement!
Embarrassingly_FASTA: Enabling Recomputable, Population-Scale Pangenomics by Reducing Commercial Genome Processing Costs from $100 to less than $1 www.biorxiv.org/content/10.64898/2026.02...
So anyway:
BiRank & QuadRank: single-cache-miss rank queries that are double the throughput of other Rust crates and fully saturate the memory bandwidth.
Side effect: QuadFm is smaller and 2-4x faster than the next-best FM-index.
github.com/RagnarGrootK...
raw.githubusercontent.com/RagnarGrootK...
Very proud to have played a small part in this important work!
EDEN: a family of genomic language models trained on up to 9.7 trillion nucleotides from @basecamp-research.bsky.social's BaseData can design large serine recombinases, bridge recombinases, and antimicrobial peptides.
www.biorxiv.org/content/10.6...
Happy to have played a small part in this!
Rapid and Consistent Genome Clustering for Navigating Bacterial Diversity with Millions of MAGs and Isolates www.biorxiv.org/content/10.64898/2025.12...
Rewriting protein alphabets with language models www.biorxiv.org/content/10.1101/2025.11....
Deciphering enzymatic potential in metagenomic reads through DNA language model www.biorxiv.org/content/10.1101/2024.12....
A General Transformer-Based Multi-Task Learning Framework for Predicting Interaction Types between Enzyme and Small Molecule www.biorxiv.org/content/10.1101/2025.10....
RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models
Zinnia Ma, Neville P. Bethel
bioRxiv 2025.09.23.678152; doi: doi.org/10.1101/2025...
Precisely calling mutations across hundreds of bacterial isolates has been hard, requiring manual filtering and expertise.
Until now, using AccuSNV.
Herui Liao trained an ML model based on our previous meticulously called SNVs.
www.biorxiv.org/content/10.1...
Now published in @natcomms.nature.com 🎉
www.nature.com/articles/s41...
With Gillian Rodger, @nstoesser.bsky.social, @samlipworth.bsky.social, @stat-sarah.bsky.social, and many others!
Machine learning for biosecurity: A probabilistic framework for invasive species management. Journal of Applied Ecology, 00, 1–13. doi.org/10.1111/1365...
Our preprint on our new metagenomic HiFi assembler Alice is out 🥳 Based on a *new sketching method* (🧵1/6)
👉 Preprint www.biorxiv.org/content/10.1...
👉 Github github.com/rolandfaure/...
There are millions of openly available microbial genomes, but searching them can be slow.
Until now 🥁
Introducing LexicMap, a new alignment tool that lets scientists search these data in minutes, helping track antibiotic resistance, trace outbreaks, and more.
www.ebi.ac.uk/about/news/r...
🦠
"We show that, despite this compression factor, SSEs can be used as a highly effective tertiary structure comparison tool, with accuracy that approaches that of Foldseek, while offering a 200-fold speedup. "
www.biorxiv.org/content/10.1...
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Couldn’t have said it better myself!