If you have more questions, you can check out our FAQ (gnomad.broadinstitute.org/help) or the Forum (discuss.gnomad.broadinstitute.org).
Posts by Kaitlin Samocha
We are always working to make the resource better and greatly appreciate the feedback we've received from the community!
A huge thank you to the tireless gnomAD production and browser teams, who made this possible. Special thanks to Katherine Chao, Julia Goodrich, Phil Darnowsky, Ruchit Panchal, Kristen Laricchia, and Mike Wilson.
More details + extra features can be found in the blog post.
We are excited to share our gnomAD v4.1.1 release
gnomad.broadinstitute.org/news/2026-03...
Major changes:
* Constraint scores on X and Y
* Improved coverage correction
* LOFTEE fix
* Guidance on constraint cut-offs
* New quality flag for low coverage/mappability genes
@gnomad-project.bsky.social
To nominate disease genes, we introduce two discovery scores: ΔPEPPER flags genes where biological features predicts clinical impact beyond whats published, and DisPo highlights genes under strong constraint with limited literature. Together they prioritize hundreds of candidate genes for follow-up
We train a new model trained on biomedical literature (PEPPER_XGB). Mix LOEUF and PEPPER to make an OMELET, which outperforms either each individual model in identifying disease genes
Precision-recall curves showing LOEUF-MIS outperforming other metrics
We introduce LOEUF-MIS, combining pLoF and top 1% predicted deleterious missense constraint. This captures not just LoF but also gain-of-function and dominant-negative signals.
Figure 1a-b, growth of variation with sample size
gnomAD v4's 5x sample increase benefits both common and rare disease: more common variants observed across ancestries improve diagnostic filtering, while more rare variants strengthen constraint metrics for disease gene detection
Excited to share our new preprint on gnomAD v4! We present the full analysis of 730,947 exomes — new constraint metrics, improved LoF annotation (LOFTEE-2), LLM-based literature curation, and a unified framework for gene discovery and rare disease diagnosis. www.medrxiv.org/content/10.6...
If you are submitting an NIH grant in February, you will be required to use SciENcv to prepare you biosketch.
IT IS MUCH WORSE THAN YOU CAN POSSIBLY IMAGINE.
Set aside *at least* 4 hours just to transfer an existing an biosketch into SciENcv.
I guess we all decided to work on the biosketch at the same time?
Ugh. This is the part I didn’t finish yet. The interface kept crashing and it was a nightmare to find things.
I have a nicely curated MyBibliography elsewhere but it seems like it didn’t link to that one.
Good to know. 🥲
Is there an official "this is what it should look like" version floating around? I was working on it today and the preview looks weird, but maybe that is just the way it is.
Why do some individuals defy their polygenic score?
In the largest study of its kind (402k UKB individuals; 7 continuous traits + 3 diseases), we asked: If your phenotype deviates from common-variant polygenic score prediction, what's driving that difference?
www.medrxiv.org/content/10.6...
Huge congratulations on your next step! Sounds like an amazing opportunity. :)
Massive single-cell study by Kanai et al (www.medrxiv.org/content/10.1...):
- Once statistical power is high, constrained genes have more (though weaker) eQTLs.
- Chromatin-QTLs near constrained genes have "normal" effect sizes, colocalize more with disease, but exhibit attenuated peak-gene effects.
New paper on everyone’s favourite topic, QC!
We show why you should do genotype-level QC on your WGS data
www.biorxiv.org/content/10.1...
Very real quotes about this paper -
“The most exciting, mind-blowing paper of the year!”
“On a par with Fisher 1918”
“I read it every night. Just so beautiful”
New study of 800K+ genomes from gnomAD reveals most “pathogenic” variants in healthy people aren’t truly disease-tolerant. They are explained by annotation errors, mosaicism, or compensatory variants. 🧬
A big step for precision medicine!
www.nature.com/articles/s41...
I haven't found it printed but there is a PDF link here: www.ashg.org/wp-content/u...
📃 We’re excited to share our latest work, now published in Nature Communications — a major update to the Genome Aggregation Database (gnomAD) that improves allele frequency resolution for two gnomAD-defined genetic ancestry groups using local ancestry inference (LAI).
Now published! Our paper on:
(1) Accurate sequencing of sperm at scale
(2) Positive selection of spermatogenesis driver mutations across the exome
(3) Offspring disease risks from male reproductive aging
[1/n]
www.nature.com/articles/s41...
Image of an old building in Oxford with the heading 'postdoc opportunities' and the text 'computational approaches to improve rare disease diagnosis and treatment' and 'Big Data Institute, University of Oxford'
📣 We are recruiting! Please share!!
Are you a bioinformatician / computational scientist who wants to apply your skills to understanding regulatory biology and improving rare disease diagnosis and treatment? 🧠 💻 🧬 🩺
We have two roles available 👇
🧵 1/4
Isn't genetics cool???
Within only 145 nucleotides(!) of a non-coding RNA (RNU4-2) - different variants in distinct regions / structures cause three distinct disorders!!! (all discovered within the last 18 months)
🤯🤓🧬❤️
🗣️ Quote of #ESHG2025 (so far)
"Who licks bone !?!" 🦴
- Johannes Krause
Anyone have that on your bingo card?
Well apparently archeologists do, to distinguish bone from stones and it causes problems in DNA sequencing. 🤔
We are just wrapping up day 1 at #ESGH2025 in beautiful Milan. For those who want some extra fun while listening to the great science, you can play bingo.👇
I know multiple of these have already occurred.
Buongiorno Milano! Ready for a great day 1 of #eshg2025?
Packed program of excellent science 8.30am-8.00pm - plus networking event till 9.30pm to meet many friends, colleagues and collaborators! …andiamo @eshg.bsky.social @eshgyoung.bsky.social
🤗 Hugely excited to share our work on automating iterative reanalysis in #raredisease, preprint out: www.medrxiv.org/content/10.1... 🤖🧬
github.com/populationge...
A superb collaboration with @dgmacarthur.bsky.social @cassimons.bsky.social @heidirehm.bsky.social @ksamocha.bsky.social and many more!
Human Developmental Cell Atlas (HDCA) expression data is now displayed. Expression is displayed in 12 sections of a 6-7 post-conception week human embryo, alongside a sagittal view which displays the region of the embryo represented by each section @mhaniffa.bsky.social
A few weeks ago, I had an incredibly emotional call with James Coney, a writer for the Sunday Times whose son Charlie was in the @genomicsengland.bsky.social 100k project and was recently diagnosed with ReNU syndrome. This beautiful article tells their story ❤️ www.thetimes.com/article/0bcc...