Advertisement · 728 × 90

Posts by Andrew Carroll

Release led by Kishwar Shafin.

Special thanks to Ehud Amitai and Ultima Genomics for a contribution that improves accuracy (~4% error reduction) for all technologies.

Core team: Alexey Kolesnikov, Daniel Cook, Lucas Brambrink & manager Pi-Chuan Chang. 20%er work in release notes

1 month ago 2 0 0 0
Preview
Release DeepVariant 1.10.0 · google/deepvariant DeepVariant: Continuous phasing: Long-read variant calls (PacBio and ONT) are now natively phased and phased output is generated for both vcf and gvcf formats. Fuzzy channels: Added “fuzzy channel...

Release of DeepVariant v1.10

Phased VCF output for long-reads
Accuracy improvements for multi-allelic variants
Pangenome accuracy improvements (18% fewer errors)
Most technologies ~10% faster
RNA-seq is a full supported mode
DeepSomatic is 12-40% faster

github.com/google/deepv...

1 month ago 14 4 1 0
Comparison of read mappings at HG002 chr4:40,294,825-40,295,700, showing conventional (pbmm2) read mappings (above) and portello mappings (below). The same set of unaligned input reads were input into each mapping process.

Comparison of read mappings at HG002 chr4:40,294,825-40,295,700, showing conventional (pbmm2) read mappings (above) and portello mappings (below). The same set of unaligned input reads were input into each mapping process.

What if you could improve small variant accuracy, CNV inference, and interpretability of your HiFi WGS data by taking a different approach to read mapping? Our new preprint describes portello, a method which demonstrates the potential for such improvements. (1/5)

2 months ago 25 11 1 1
The "Why Not?" Era of Sequencing Has Begun
The "Why Not?" Era of Sequencing Has Begun YouTube video by OMGenomics

Lab tour and takeaways: www.youtube.com/watch?v=nS2o...

2 months ago 3 1 1 0
For Automation The Wet Lab Has An Incredibly Long Tail Disclaimer: These are solely my views.

I wrote up some thoughts on the automation of lab work, in particular how it relates to how people will work in the lab. In short, it will deliver a lot of value for assays run at scale, but there is a long tail of experiments where humans are essential.

andrewcarroll.github.io/2026/02/09/f...

2 months ago 4 1 0 1
African penguin
Source: Wildlife Conservation Society

African penguin Source: Wildlife Conservation Society

Cotton top tamarin
Source: Wildlife Conservation Society

Cotton top tamarin Source: Wildlife Conservation Society

Eld's deer
Source: Wildlife Conservation Society

Eld's deer Source: Wildlife Conservation Society

Elongated tortoise
Source: Wich’yanan (Jay) Limparungpatthanakij, via inaturalist.org and Wikimedia Commons

Elongated tortoise Source: Wich’yanan (Jay) Limparungpatthanakij, via inaturalist.org and Wikimedia Commons

Thanks to the support of @wcs.org and Google Research, we have sequenced and assembled the genomes of nine endangered species, with more on the way!

To learn more: blog.google/innovation-a...

2 months ago 6 2 0 0
Preview
How we’re helping preserve the genetic information of endangered species with AI Scientists are working to sequence the genome of every known species on Earth.

This blog talks about the great work of the
@ebpgenome.bsky.social. To support it Google.org has funded sequencing and open release of 13 genomes, with a $3M commit to sequence 150 more and develop methods to improve assembly finishing and other bottlenecks.

blog.google/innovation-a...

2 months ago 11 2 0 0
Post image

The killing of Alex Pretti is a heartbreaking tragedy. It should also be a wake-up call to every American, regardless of party, that many of our core values as a nation are increasingly under assault.

2 months ago 60148 19514 3115 1530
The Virtual Cell Will Be More Like Gwas Than Alphafold There has been significant discussion recently on the concept of the “virtual cell.” I want to summarize the key concepts regarding what the field wants from a virtual cell and the challenges we face....

I've been thinking about the "virtual cell" concept and wanted to write up a few thoughts. Specifically on how I think the prior experience in GWAS informs the most likely way these models will be useful.

andrewcarroll.github.io/2025/12/23/t...

3 months ago 37 17 0 0
Advertisement
Hoiho - the world’s rarest penguin, fewer than 150 mainland pairs left

Hoiho - the world’s rarest penguin, fewer than 150 mainland pairs left

🐧We researched one of the world’s rarest #penguins. The yellow‑eyed penguin (aka hoiho/takaraka) isn’t one homogeneous species after all!

www.biorxiv.org/content/10.1...

#hoiho #conservation #genomics #birds #nzwildlife #endangered #wildlife #nature

5 months ago 63 22 3 3
Post image Post image

Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic www.nature.com/articles/s41... (read free: rdcu.be/eLny0) github.com/google/deeps...

6 months ago 11 5 0 1
Preview
A complete diploid human genome benchmark for personalized genomics Human genome resequencing typically involves mapping reads to a reference genome to call variants; however, this approach suffers from both technical and reference biases, leaving many duplicated and ...

Delighted to finally announce a preprint describing the Q100 project! “A complete diploid human genome benchmark for personalized genomics” For which we finished HG002 to near-perfect accuracy: www.biorxiv.org/content/10.1... 🧵[1/14]

6 months ago 97 57 4 4
Germline Small Variant Calling Workflow for SBX Duplex Data Wednesday, September 10, 2025 at 12:00 PM Eastern Daylight Time.

I'll be speaking in this webinar (go.roche.com/sbx-d) on September 10, where I'll share our benchmarks and observations for Roche's SBX sequencing instrument, as well as models developed by our team for SBX data.

8 months ago 10 4 1 1

Also thanks to 20% contributors: Ben Soudry, Mike Kruskal, Sowmiya Nagarajan, Suchismita Tripathy, Francisco Unda, Vasiliy Strelnikov

And community contributions from Sam Yadav and Seraj Ahmad at Roche improving the code for custom model training

11 months ago 1 0 0 0
Preview
Release DeepSomatic 1.9.0 · google/deepsomatic DeepSomatic: In this release, we are introducing FFPE_WGS_TUMOR_ONLY and FFPE_WES_TUMOR_ONLY models. The WGS and WGS_TUMOR_ONLY models have been retrained with all datasets described in the manusc...

Release led by Kishwar Shafin, contributions by Daniel Cook, Alexey Kolesnikov, Lucas Brambrink, and Pi-Chuan Chang as engineering manager.

Thanks to student researcher contributions from Farica Zhuang and Mobin Asri.

DeepSomatic release page:
github.com/google/deeps...

11 months ago 2 0 1 0
Preview
Release DeepVariant 1.9.0 · google/deepvariant DeepVariant: In this version we have updated our training scheme for the HG002 sample with the newly released HG002-T2T truth set which improves accuracy against that truth set. Our labeling metho...

Release of DeepVariant and DeepSomatic v1.9

DV: Now train on HG002 T2T-Q100. Error reduction of 12% for Illumina and 30% for PacBio on this truth set. 25% faster. DeepTrio is 5x faster (20h -> 4h).

DS: New models FFPE_TUMOR_ONLY for {WGS, WES}. Much improved WGS models.

github.com/google/deepv...

11 months ago 20 9 1 0
Post image

Incredibly moving Justin Trudeau remarks:

"We have fought and died alongside you....During your darkest hours...we were always there. Standing with you, grieving with you, the American people....Canadians are a little perplexed as to why our closest friends and neighbors are choosing to target us."

1 year ago 22635 6213 683 445

You have some additional control on memory use by the number of threads you run with.

For running on GPU, I am not sure if you've seen this - github.com/google/deepv...

Which requires a little more configuration, but can let you better manage CPU-GPU tradeoffs. Definitely expert use.

1 year ago 0 0 0 0

Hi Eric, sorry to not notice till now. From the DV FAQ, we see the Keras model takes 16GB of memory (github.com/google/deepv...).

It's possible that pangenome-aware models will take more memory, and we do observe more memory per thread used for that. Definitely not lower than 16GB.

1 year ago 0 0 2 0
Advertisement

Great question. We were talking recently about L40S benchmarks. We don't have that data immediately on hand, but are planning to generate runtime stats for it.

1 year ago 1 0 1 0

They're very close - to the point that small changes of coverage or the inclusion of PCR in preparation would tip between one and the other.

1 year ago 3 0 0 0

Release led by DeepVariant tech lead Kishwar Shafin. Team Engineering manager Pi-Chuan Chang. Small model work led by Lucas Brambrink. Pangenome-aware led by Mobin Asri and Juan Carlos Mier. Fast pipeline by Alexey Kolesnikov. Kinnex/MAS-Seq model by Daniel Cook and Shiyi Yin from Verily. 3/3

1 year ago 2 0 0 0
Plots of SNP and Indel error numbers for DeepVariant models. Shows a Indel error reduction of 26% for PacBio and a ~50% SNP error reduction for ONT.

Plots of SNP and Indel error numbers for DeepVariant models. Shows a Indel error reduction of 26% for PacBio and a ~50% SNP error reduction for ONT.

Added SPRQ to PacBio training, reducing Indel error on SPRQ by 26%. Added Platinum Pedigree training data for PacBio model, reducing errors by 34% on more extensive Platinum truth. New model and case study for Kinnex/Mas-Seq/Iso-Seq. Additional speed options for GPU pipelines 2/3

1 year ago 8 4 3 0
Runtime figure for new version of DeepVariant with and without small model. Showing reduction in runtime of 155 minutes to 101 minutes with Illumina, 174 minutes to 71 minutes with PacBio, and 295 minutes to 114 minutes with Oxford Nanopore.

Runtime figure for new version of DeepVariant with and without small model. Showing reduction in runtime of 155 minutes to 101 minutes with Illumina, 174 minutes to 71 minutes with PacBio, and 295 minutes to 114 minutes with Oxford Nanopore.

Release of DeepVariant 1.8. Large speed improvement (~67% faster) via small model for easy sites. New Pangenome-aware option. Reduces error by ~30% for vg-mapped WGS, ~10% for BWA WGS, ~5% BWA exome. New config for custom model users, see release notes.

(github.com/google/deepv...)

1 year ago 38 14 1 0
Preview
Personalized Pangenome References bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution

How do we make a pangenome maximally relevant for the study of a new sample? www.biorxiv.org/content/10.1...

2 years ago 10 3 0 0

Release by Kishwar Shafin

Major contributions from Pi-Chuan Chang, Daniel Cook, Alexey Kolesnikov

Google 20%ers: Will Kwan, Pauline Sho, Lucas Brambrink, Mo Samman, Atilla Kiraly

UCSC for vg: @benedictpaten.bsky.social, Shloka Negi, Jimin Park, Mobin Asri

Pacbio: Billy Rowell, Nathaniel Echols

2 years ago 1 0 0 0

There are now custom models and case studies for CompleteGenomics instruments.

T7: github.com/google/deepv...

G400: github.com/google/deepv...

For now, these are stand alone models. We'll likely consider whether we can jointly include these in the broad WGS model later.

2 years ago 2 0 1 0
Advertisement

The changes to DeepTrio for de novo detection are substantial. We now in two steps - first for overall accuracy and then a weighted fine tuning for de novos. Our benchmarks show large improvements in de novo calling relative to the prior DeepTrio.

github.com/google/deepv...

2 years ago 1 0 1 0

Want to benefit from pangenomes and want a recipe?

github.com/google/deepv...

Shows a step by step process, with Docker images for how to map to a Pangenome reference w/ vg and calls w/ DeepVariant. Final calls are more accurate and in GRCh38 coordinates. Thanks to the UCSC team for co-development

2 years ago 2 0 1 0
Preview
Release DeepVariant 1.6.0 · google/deepvariant Improved support for haploid regions, chrX and chY. Users can specify haploid regions with a flag. Updated case studies show usage and metrics. Added pangenome workflow (FASTQ-to-VCF mapping with V...

Release of DeepVariant v1.6.

Support for haploid regions, chrX/Y.
Workflow for Pangenome FASTQ-to-VCF.
Major DeepTrio improvements for de novo variants.
Models for CompleteGenomics T7, G400
Add NovaSeqX to training data

Release by Kishwar Shafin

github.com/google/deepv...

2 years ago 7 4 1 0