Advertisement · 728 × 90

Posts by Daniel Anderson

Agent-Guided De Novo Design of Nanobody Binders Against a Novel Cancer Target www.biorxiv.org/content/10.64898/2026.04...

5 days ago 0 1 0 0

Ecology of metagenomes: incorporating genotype-to-phenotype maps into ecological models www.biorxiv.org/content/10.6... #jcampubs

1 week ago 14 2 0 0

TFBindFormer:A Cross-Attention Transformer for Transcription Factor--DNA Binding Prediction www.biorxiv.org/content/10.64898/2026.04...

1 week ago 2 1 0 0

gbdraw: a genome diagram generator for microbes and organelles www.biorxiv.org/content/10.64898/2026.04...

1 week ago 4 4 0 0
Post image

Happy to have made a small contribution for @gbouras13.bsky.social's study. Hope this is just one of many examples in independent researcher collaboration with academic scientists!

#archaeasky folks, this pipeline rules for global baseline annotation for archaea

doi.org/10.64898/202...

💻🧬🧫🦠

2 weeks ago 25 8 2 0

Helicase: Vectorized parsing and bitpacking of genomic sequences www.biorxiv.org/content/10.64898/2026.03...

1 month ago 17 5 0 1

10-minimizers: a promising class of constant-space minimizers www.biorxiv.org/content/10.64898/2026.03...

1 month ago 8 3 0 2

As a bioinformatician, it doesn’t get more exciting than this!

Scaling to 1T+ genes isn’t just collecting more data. It demands new algorithms, infrastructure, and abstractions for complex biology.

Excited to be working with an incredible team to push the boundaries of sequence informatics.

1 month ago 4 2 0 0
Advertisement

Sensitive and scalable metagenomic classification using spaced metamers, reduced alphabets, and syncmers www.biorxiv.org/content/10.64898/2026.03...

1 month ago 13 8 0 1

Our new preprint on quantifying microbial sample diversity/complexity in a way that accounts for both metagenome architecture and taxonomic composition is now live on bioRxiv: www.biorxiv.org/content/10.6...

#metagenomics #bioinformatics #dataanalysis #graphdata

1 month ago 14 8 2 1

Splicer: Phylogenetic Placement in Sub-Linear Time www.biorxiv.org/content/10.64898/2026.02...

2 months ago 1 1 0 0

Huge congratulations @martibartfast.bsky.social and @zaminiqbal.bsky.social on the publication of this fantastic and massive paper. A huge achievement!

2 months ago 4 0 1 0

Embarrassingly_FASTA: Enabling Recomputable, Population-Scale Pangenomics by Reducing Commercial Genome Processing Costs from $100 to less than $1 www.biorxiv.org/content/10.64898/2026.02...

2 months ago 0 1 0 0

So anyway:
BiRank & QuadRank: single-cache-miss rank queries that are double the throughput of other Rust crates and fully saturate the memory bandwidth.
Side effect: QuadFm is smaller and 2-4x faster than the next-best FM-index.

github.com/RagnarGrootK...

raw.githubusercontent.com/RagnarGrootK...

2 months ago 18 9 2 0

Very proud to have played a small part in this important work!

3 months ago 1 0 0 0
Post image Post image Post image Post image

EDEN: a family of genomic language models trained on up to 9.7 trillion nucleotides from @basecamp-research.bsky.social's BaseData can design large serine recombinases, bridge recombinases, and antimicrobial peptides.

www.biorxiv.org/content/10.6...

Happy to have played a small part in this!

3 months ago 18 5 0 0

Rapid and Consistent Genome Clustering for Navigating Bacterial Diversity with Millions of MAGs and Isolates www.biorxiv.org/content/10.64898/2025.12...

3 months ago 0 1 0 0

Rewriting protein alphabets with language models www.biorxiv.org/content/10.1101/2025.11....

4 months ago 0 1 0 0
Advertisement

Deciphering enzymatic potential in metagenomic reads through DNA language model www.biorxiv.org/content/10.1101/2024.12....

1 year ago 1 1 0 0

A General Transformer-Based Multi-Task Learning Framework for Predicting Interaction Types between Enzyme and Small Molecule www.biorxiv.org/content/10.1101/2025.10....

6 months ago 2 2 0 0
Preview
RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models Protein language models (pLMs) have the capacity to infer structural information from amino acid sequences. Evaluating the extent to which structural information they truly encode is crucial for asses...

RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models
Zinnia Ma, Neville P. Bethel
bioRxiv 2025.09.23.678152; doi: doi.org/10.1101/2025...

6 months ago 10 2 0 0
Preview
High-accuracy SNV calling for bacterial isolates using deep learning with AccuSNV Accurate detection of mutations within bacterial species is critical for fundamental studies of microbial evolution, reconstructing transmission events, and identifying antimicrobial resistance mutati...

Precisely calling mutations across hundreds of bacterial isolates has been hard, requiring manual filtering and expertise.

Until now, using AccuSNV.

Herui Liao trained an ML model based on our previous meticulously called SNVs.
www.biorxiv.org/content/10.1...

6 months ago 72 34 2 1

Now published in @natcomms.nature.com 🎉

www.nature.com/articles/s41...

With Gillian Rodger, @nstoesser.bsky.social, @samlipworth.bsky.social, @stat-sarah.bsky.social, and many others!

6 months ago 21 14 0 0
Preview
Machine learning for biosecurity: A probabilistic framework for invasive species management By using pre-introduction traits and leveraging ML for early detection, this study presents a scalable, data-driven framework for invasion risk assessment and conservation planning. Our approach enab...

Machine learning for biosecurity: A probabilistic framework for invasive species management. Journal of Applied Ecology, 00, 1–13. doi.org/10.1111/1365...

6 months ago 1 1 0 0
Preview
Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching We introduce Mapping-friendly Sequence Reduction (MSR) sketches, a sketching method for high-fidelity (HiFi) long reads, and Alice, an assembler that operates directly on these sketches. MSR produces ...

Our preprint on our new metagenomic HiFi assembler Alice is out 🥳 Based on a *new sketching method* (🧵1/6)
👉 Preprint www.biorxiv.org/content/10.1...
👉 Github github.com/rolandfaure/...

6 months ago 25 21 2 0
Preview
How to rapidly search the world’s microbial DNA By making the world’s microbial DNA easier to explore, LexicMap helps researchers track outbreaks, study antibiotic resistance, and understand microbial diversity.

There are millions of openly available microbial genomes, but searching them can be slow.

Until now 🥁

Introducing LexicMap, a new alignment tool that lets scientists search these data in minutes, helping track antibiotic resistance, trace outbreaks, and more.

www.ebi.ac.uk/about/news/r...
🦠

6 months ago 41 16 1 1
Advertisement
Preview
Compression of protein secondary structures enables ultra-fast and accurate structure searching Protein structure prediction has undergone a revolution with the advent of AI- based algorithms, such as AlphaFold and RoseTTAFold. As a result, over 200 million predicted protein structures have been...

"We show that, despite this compression factor, SSEs can be used as a highly effective tertiary structure comparison tool, with accuracy that approaches that of Foldseek, while offering a 200-fold speedup. "

www.biorxiv.org/content/10.1...

7 months ago 19 10 0 0
Preview
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.

Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...

7 months ago 190 99 5 4

Couldn’t have said it better myself!

11 months ago 4 1 0 0