Pangenomes, but scalable.
Panmap: phylogeny-guided framework for read alignment, genotyping, sample placement on pangenomes. 600x smaller indexes, faster builds, and placement from 20K to 8M genomes. @amkram.bsky.social @alanbyzhang.bsky.social @russcd.bsky.social
www.biorxiv.org/content/10.6...
Posts by Russ Corbett-Detig
eDNA is possibly the coolest application! Below: read-level placement across a pan-vertebrate mitochondrial pangenome (left) where a clear cluster of read assignments supports the presence of many mammoth reads (right). Processing ~9 billion reads took just 8 minutes!
Even in complex mixtures, Panmap does a really good job of sorting out their identities and relative contributions to the sample.
Panmap can also evaluate sample mixtures (e.g., what you get from wastewater or eDNA), by scoring the reads separately against every node and then estimating the abundances of lineages via EM.
Because we score placements dynamically during a single tree traversal, it is very fast. Panmap takes less than one second to place a SARS-CoV-2 sample onto a 20,000 sample phylogeny, align the read, and call genotypes.
Panmap gets really accurate placements, even at very low sequencing depths.
There is a ton of cool stuff you can do using the index. In the first application, Panmap place a single sample onto the tree *without assembly* using the raw reads and then can align and genotype based on this closest known relative.
The central innovation of Panmap is indexing PanMANs by producing a “syncmer annotated tree”, where seed edits are stored only where sequences on the phylogeny change (including inferred ancestors). The index therefore exploits a type of “phylogenetic compression” and is very compact.
This is the result of our incredibly rewarding collaboration with Yatish Turakhia and members of his lab. In particular, and foundational to our work, we previously developed the underlying phylogenetic Pangenome data structure, PanMAN.
www.nature.com/articles/s41...
Alex Kramer, Alan Zhang and friends posted our preprint today. In it, we introduce Panmap, a tool for phylogenetic placement, assembly, lineage abundance estimation, and eDNA assignment using phylogenetic pangenomes.
www.biorxiv.org/content/10.6...
Open question for anyone that develops/maintains of bioinformatics tools, especially those used for public health - what do you think is the right way to fund this long term?
Grants are great for new research directions, but aren't really appropriate for most tool dev/maintenance.
Disclaimer: I am a terrible web developer and this extension should not be used by anyone.
Thanks so much, Ian! This means a lot coming from a mighty core-pivotal 17.
I’m already getting community feedback on the core-pivotal index. I truly appreciate it. But in the spirit of core-pivoteering, I will only implement suggestions from whoever has the highest core-pivotal index.
Why this matters:
As biobanks + population sequencing projects scale into the hundreds of thousands (and millions), the bottleneck isn’t just inference. It’s exploration.
Feedback welcome 🌲
Truly, we harvest the truffulas standing on very tall shoulders.
On the visuals inspiration side: a huge thanks to @theo.io and Taxonium.
Taxonium showed the community that we can interactively explore enormous phylogenetic trees in the browser at pandemic scale.
Lorax brings a similar philosophy to ARGs.
Under the hood, Lorax runs on the incredibly powerful data model + API from tskit, developed by Jerome Kelleher and collaborators.
Tree sequences make it possible to store and traverse genome-wide genealogies efficiently, and Lorax uses that structure directly in the backend.
What does Lorax do?
It lets you dynamically explore tree sequences at massive scales - zooming through local trees, inspecting mutations, querying ancestry.
Built for *huge* datasets with up to millions of samples.
tl;dr:
Read about Lorax: www.biorxiv.org/content/10.6...
Try Lorax: lorax.ucsc.edu/view/1kg_chr...
Install Lorax: pypi.org/project/lora...
@pratikkatte.bsky.social and I just released Lorax 🌲, a tool for interactive exploration of biobank-scale ancestral recombination graphs (ARGs).
If you’ve ever wanted to scroll across the ancestries of thousands of genomes… this is for you.
We have posted data providing real-time measurement of human neutralizing antibody landscape to seasonal influenza.
Data explain spread of subclades K (H3N2) & D.3.1.1 (H1N1), identify subclade K subvariants w reduced neutralization, & can inform choice of strains for next vaccine.
A long time ago in a galaxy far away, there was a SARS-CoV-2 pandemic. Our paper, led by @martibartfast.bsky.social
a) correcting errors in 4.5 million genomes & their phylogeny
b) improving representation of the Global South in public data
www.nature.com/articles/s41...
(thread 1/n)
@cademirch.bsky.social @erikenbody.bsky.social TB Sackton & @russcd.bsky.social introduce Callable Loci And More (clam), a tool that leverages callable loci to accurately estimate population genetic statistics (π, dxy, and FST).
🔗 doi.org/10.1093/molbev/msaf282
#evobio #molbio #compbio
Genetti & @russcd.bsky.social investigated Puerto Rico honeybees, suggesting that local pressures on bee behavior may have induced changes in alleles linked to different ancestries at loci involved in neuronal development, behavior, and mating.
🔗 doi.org/10.1093/gbe/evaf217
#genome #evolution
I am 100% stoked to lead this effort again this year! Come on out to beautiful Santa Cruz and show off your awesome-est science.
Our lab has an opening for a research technician to contribute to our efforts to understand RSV evolution & its impact on antibody countermeasures (see journals.asm.org/doi/full/10....). The tech will also help w lab management.
If interested, apply here: careers-fhcrc.icims.com/jobs/29940/job
Thank you! I might need to steal this wonderful artwork!
Super glad to see this out !
Lovely collaboration with @russcd.bsky.social @crouxevo.bsky.social and the groups of @meauxjuliette.bsky.social @plantadaptation.bsky.social