Konrad (@konradjk) Bsky

Special thanks to all @gnomad-project.bsky.social participants, data contributors, and funders, as well as @ksamocha.bsky.social, Jeremy Guez, and Julia Goodrich, as well as all co-authors for their tireless efforts on this

3 weeks ago 1 0 0 0

To nominate disease genes, we introduce two discovery scores: ΔPEPPER flags genes where biological features predicts clinical impact beyond whats published, and DisPo highlights genes under strong constraint with limited literature. Together they prioritize hundreds of candidate genes for follow-up

3 weeks ago 0 1 1 0

We train a new model trained on biomedical literature (PEPPER_XGB). Mix LOEUF and PEPPER to make an OMELET, which outperforms either each individual model in identifying disease genes

3 weeks ago 1 2 2 0

Precision-recall curves showing LOEUF-MIS outperforming other metrics

We introduce LOEUF-MIS, combining pLoF and top 1% predicted deleterious missense constraint. This captures not just LoF but also gain-of-function and dominant-negative signals.

3 weeks ago 1 1 1 0

Figure 1a-b, growth of variation with sample size

gnomAD v4's 5x sample increase benefits both common and rare disease: more common variants observed across ancestries improve diagnostic filtering, while more rare variants strengthen constraint metrics for disease gene detection

3 weeks ago 1 1 1 0

Integrating 730,947 exome sequences with clinical literature improves gene discovery Accurate estimates of allele frequencies aid in genetic discovery, including rare disease diagnosis, common disease investigations, and population genetics. Here, we present the Genome Aggregation Dat...

Excited to share our new preprint on gnomAD v4! We present the full analysis of 730,947 exomes — new constraint metrics, improved LoF annotation (LOFTEE-2), LLM-based literature curation, and a unified framework for gene discovery and rare disease diagnosis. www.medrxiv.org/content/10.6...

3 weeks ago 36 17 2 1

The analogy I find myself frequently using here: if everyone smoked (or no one did), the heritability of lung cancer would go up. We think about this a lot in the context of cross-biobank comparisons

2 months ago 6 0 0 0

gs://ukb-diverse-pops-public/misc/pairwise/pairwise_correlations_regressed.txt.bgz - it’s coded in the way that our pan-UKB phenotypes were so not sure if it’s super easy to use but that’s pairwise r_p for ~14k phenos

6 months ago 0 0 0 0

Special thanks to all co-authors that got this here including @masakanai.bsky.social @rahulg603.bsky.social @dalygene.bsky.social @egatkinson.bsky.social and of course @genetisaur.bsky.social for driving this through 5 years of work (after the first GWASes were done!)

7 months ago 5 0 0 0

Tons of lessons learned around carefully controlling population stratification, using heritability as a QC metric, and probably most importantly, quantifying novelty in a mega-phenotype analysis. Some really cool analyses to find interesting biology e.g. allelic series and ancestry-enriched variants

7 months ago 1 0 1 0

Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects - Nature Genetics Genome-wide analyses for 7,266 traits leveraging data from several genetic ancestry groups in UK Biobank identify new associations and enhance resources for interpreting risk variants across diverse p...

A project many years in the process, we’re pleased to present our work on multi-ancestry meta-analysis across a boatload of traits in the UK Biobank: www.nature.com/articles/s41...

7 months ago 65 26 1 0

All by All The All by All browser maps known and novel associations between genotypes and phenotypes using data contributed by All of Us Research Program participants as of July 1, 2022. All by All encompasses a...

We’ve put up summary statistics for over 3,000 traits in the All of Us resource, and a shiny new browser alongside it! Explore your favorite gene or phenotype here: allbyall.researchallofus.org #ASHG24

1 year ago 35 14 1 1

Starter pack of people who create starter packs?

1 year ago 1 0 0 0

You mean “ReNally???”?

1 year ago 0 0 1 0

Heh, it was on our list but somehow never made it into the pre-submission checklist. Will do!

1 year ago 0 0 0 0

Interesting question. We do have a “gnomAD-new” analysis in there but haven’t broken down by ancestry - i fear a lot is going to be driven by “not yet observed” (which is the same across all ancestries)

1 year ago 1 0 0 0

It gets a bit more complicated though - these scores have a mix of impacts of variant-to-gene, as well as prioritizing which genes, when disrupted, lead to phenotypes. Perhaps a new method that combines both these insights optimally will outperform them all!

1 year ago 0 0 1 0

We found that population-focused methods do best for identifying highly impactful variants (de novo’s in individuals with developmental disorders for instance), while the deep learning methods are better at prioritizing inherited variation in biobanks

1 year ago 0 0 1 0

Variant scoring performance across selection regimes depends on variant-to-gene and gene-to-disease components bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution

We have a new preprint that we’d love feedback on! We benchmarked a bunch of variant scoring methods to figure out what they were actually doing, and how they performed across selection regimes: www.biorxiv.org/content/10.1...

1 year ago 7 6 2 0

Welcome new followers (and thanks @michelnivard.bsky.social)! I’m loving the critical mass, and to celebrate, I’ll post some exciting new content (my first time posting here and not on the the other site)

1 year ago 2 0 0 0

Recently out on #bioRxiv: our updated approach to identify regional variability in missense mutation intolerance (“constraint”) in protein-coding genes using the gnomAD database.

www.biorxiv.org/content/10.1...

1/10

2 years ago 6 2 1 0

The Scalable Variant Call Representation: Enabling Genetic Analysis Beyond One Million Genomes bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution

As genomic analyses scale to millions of exomes/genomes, we need a scalable infrastructure to process/QC/handle these data while retaining all the metrics needed for downstream analysis. A new preprint from the Hail team proposes a way to do this! Comments welcome: www.biorxiv.org/content/10.1...

2 years ago 4 1 1 0

Paella is good. With LOEUF I assumed the culmination would be an omelette but CHARR is better in a paella, so maybe it’s a multi-course meal

2 years ago 0 0 1 0

Extended data figure 2b has exclusive exon-only. I think we internally made some with intermediate overlaps and it was an intermediate result as you’d expect

2 years ago 1 0 0 0

An expanded genomic database for identifying disease-related variants An expanded version of a human-genome database called gnomAD, containing 76,156 whole-genome sequences, has enabled investigation of how variants in non-protein-coding regions of the genome affect hea...

And thanks to Ryan Dhindsa and Slavé Petrovski for the excellent writeup and context around our work. Excited for the times ahead! www.nature.com/articles/d41...

2 years ago 2 0 1 0

This is all thanks to an amazing production team, browser team, and steering committee @gnomad-project.bsky.social, the 76,156 individuals that provided their genomes, and support from Broad Genomics and Hail

2 years ago 1 0 1 0

Interestingly, these scores also provide additional insight into genes regulated by these regions, even those underpowered by previous constraint metrics:

2 years ago 0 0 1 0

Gnocchi extends our constraint metrics to the non-coding genome, highlighting for instance, disease-associated non-coding CNVs

2 years ago 1 0 1 0

We built a new metric we called gnocchi (genomic non-coding constraint of haploinsufficient variation), building on methods that find depletions of variation (natural selection), which we show can prioritize functional variation

2 years ago 2 0 1 1

A genomic mutational constraint map using variation in 76,156 human genomes - Nature A genomic constraint map for the human genome constructed using data from 76,156 human genomes from the Genome Aggregation Database shows that non-coding constrained regions are enriched for regulator...

Thrilled to have our work on gnomAD out in print at Nature today. With 76K genomes, we can look beyond the coding genome and into the non-coding genome to find regions important for human disease idp.nature.com/authorize?re...

2 years ago 13 5 1 1

Posts by Konrad