Excited for our publication on how the geographic scale of a sample affects the discovery of rare, deleterious variants to be out this week. With a mix of theory, simulation, and data analysis, we show when samples are narrow vs broad, the number of variants discovered and their frequencies change
Posts by Maggie Steiner
Thank you!
Out today in @pnas.org! www.pnas.org/doi/10.1073/...
What do GWAS and rare variant burden tests discover, and why?
Do these studies find the most IMPORTANT genes? If not, how DO they rank genes?
Here we present a surprising result: these studies actually test for SPECIFICITY! A 🧵on what this means... (🧪🧬)
www.biorxiv.org/content/10.1...
Hi! Could I please be added? Thanks for setting this up!
I just figured out how to use feeds! So, sharing this with #popgen 🧪
Thanks Erik!
Thanks to co-lead Dan Rice & co-authors @aabiddanda.bsky.social, Marida Ianni-Ravn, and Chris Porras!
Overall - while our theoretical model is no doubt a simplification of the complex dispersal/evolutionary processes seen in natural populations, especially humans - we hope that this work will help improve our interpretation of existing genetic studies and provide guidance for the design of new ones.
Our results have implications for several applications of genetic data. Power to detect trait/disease associations (e.g., GWAS) is tied to allele frequency. The SFS is also used for inference of the distribution of fitness effects, which our results suggest may be biased by effects of study design.
However, when it comes to avg. allele frequency across all sites (incl. monomorphic ones) these effects can cancel - in our theoretical model we see unchanging avg. allele frequency with sampling design. In human data we see this for fine scale samples (within the UK) but not for broader samples.
We find evidence of these effects in re-sampling experiments using the UK Biobank. For example, our broadest re-sample with n=10,000 discovers ~98% more variant LoF sites than our most narrow sample, but allele frequency at those variant sites is on average ~41% lower.
Broad samples will sample a greater number of rare, deleterious variants than narrow samples (we call this “discovery”), but each will be sampled at lower average frequency (we call this “dilution”). These effects lead to substantial changes in some summary statistics, especially for large samples.
We develop a model for the evolution of carriers of rare deleterious variants, and use it to approximate the site frequency spectrum (SFS, the distribution of allele frequencies) in samples at various scales of geographic breadth. We find several key patterns as samples go from “narrow” to “broad”.
We focus on rare, deleterious variants, which are expected to cluster in geographic space. Rare variants are also generally of interest since they tend to have large effects on traits (including disease traits), and can help improve understanding of biological mechanisms.
In particular, we are interested in geographic breadth, or how broad a region across which individuals are sampled. This is important to current discourse in human genetics surrounding the Euro-centric bias of genetic datasets, and the launch of new biobanks to improve representation globally.
Excited to share a new preprint with @jnovembre.bsky.social ! We use a combination of population genetic theory, simulation, and data analysis to ask: how does study design in genetic studies (including biobanks) impact the discovery of rare, deleterious variants?
Thanks to co-lead Dan Rice + co-authors @aabiddanda.bsky.social, Marida Ianni-Ravn, and Chris Porras!
Overall - while our theoretical model is no doubt a simplification of the complex dispersal/evolutionary processes seen in natural populations, especially humans - we hope that this work will help improve our interpretation of existing genetic studies and provide guidance for the design of new ones.
Our results have implications for several applications of genetic data. Power to detect trait/disease associations (e.g., GWAS) is tied to allele frequency. The SFS is also used for inference of the distribution of fitness effects, which our results suggest may be biased by effects of study design.
However, when it comes to avg. allele frequency across all sites (incl. monomorphic ones) these effects can cancel - in our theoretical model we see unchanging avg. allele frequency with sampling design. In human data we see this for fine scale samples (within the UK) but not for broader samples.
We find evidence of these effects in re-sampling experiments using the UK Biobank. For example, our broadest re-sample with n=10,000 discovers ~98% more variant LoF sites than our most narrow sample, but allele frequency at those variant sites is on average ~41% lower.
Broad samples will sample a greater number of rare, deleterious variants than narrow samples (we call this discovery), but each will be sampled at lower average frequency (we call this dilution). These effects lead to substantial changes in some summary statistics, especially for large samples.
We develop a model for the evolution of carriers of rare deleterious variants, and use it to approximate the site frequency spectrum (SFS, the distribution of allele frequencies) in samples at various scales of geographic breadth. We find several key patterns as samples go from “narrow” to “broad”.
We focus on rare, deleterious variants, which are expected to cluster in geographic space. Rare variants are also generally of interest since they tend to have large effects on traits (including disease traits), and can help improve understanding of biological mechanisms.
In particular, we are interested in geographic breadth, or how broad a region across which individuals are sampled. This is important to current discourse in human genetics surrounding the Euro-centric bias of genetic datasets, and the launch of new biobanks to improve representation globally.