It was a pleasure working with this team, and I look forward to learning more from P. formosa, which is an excellent model for learning more about both speciation/hybrid incompatibility and the effects of inbreeding/genetic rescue. Hope you enjoy!
www.nature.com/articles/s41...
Posts by Nathan Schaefer
It's therefore possible that expression dynamics strongly influence which genes contribute to hybrid incompatibility, and that noncoding regulatory sequences play a much greater role than coding substitutions.
RME allows heterozygous alleles directly to influence cellular phenotype without buffering one other; this can speed selection and cause rapid cross-species divergence.
RME genes are also overrepresented at loci where humans have purged archaic hominin ancestry:
www.science.org/doi/10.1126/...
We see strong enrichment of genes involved in processes like cell adhesion, cell migration, and cell-cell signaling. These appear to be the same types of genes that others found to undergo random cell-by-cell monallelic expression (RME) in humans.
www.cell.com/cell-reports...
Surprisingly, we see little evidence of this when considering coding mutations: GC rarely touches fixed coding differences between the parental species.
What about noncoding sequence? We asked what types of genes tend to be near the noncoding sequences that GC changed the most in frequency.
When species diverge, gene networks can accumulate sets of new mutations that are compatible and preserve network function within each species, but which can cause dysfunction when brought together in a hybrid.
Does GC help alleviate this problem in P. formosa?
www.nature.com/scitable/con...
We then looked for evidence of positive selection in noncoding sequence and found that gene conversion copies sequences positively selected (low Tajima's D, high Fay & Wu's H, high Zeng et al's E) in their parental species of origin. P. formosa ends up with the handiest sequences from both parents.
We asked what types of mutations gene conversion (GC) is likeliest to copy versus overwrite and found evidence that it aids purifying selection. GC likes to eliminate young mutations and those with high variant impact (likely to be deleterious).
Gene conversion events occur at nonrandom sites and are associated with polyA/polyT repeats, which have been implicated in double strand break formation during DNA replication.
www.cell.com/cell/fulltex...
Through phased genome assembly, ancestral genome reconstruction, and population resequencing, we find evidence that gene conversion, in which P. formosa "overwrites" sequence from one of its haplotypes with the other, likely facilitates selection.
www.nature.com/articles/d41...
Without meiotic recombination, there is no mechanism for decoupling mildly deleterious mutations from beneficial ones. This should weaken purifying selection, and ultimately cause extinction, in a process known as Muller's Ratchet. Prior modeling work suggested this should have already happened.
The Amazon molly (Poecilia formosa) is the first clonal vertebrate species known to science. It was formed via hybridization between sister species P. latipinna and P. mexicana around 100kya; hybridization disrupted meiosis and produced an all-female, asexual lineage.
www.nature.com/articles/s41...
Excited to share a new study co-written with Ed Ricemeyer (LMU Munich), supervised by Manfred Schartl (U. of Würzburg) & Wes Warren (U. of Missouri).
We investigated how the Amazon molly has survived for 100,000 years (more generations) despite clonal reproduction.
www.nature.com/articles/s41...
Thanks for reading, and good luck checking IDs and keeping the rifraff out of your single cell data sets.
www.biorxiv.org/content/10.1...
github.com/nkschaefer/c...
In total, our study demonstrates the need for this set of tools, which provide new functionality, speed, and/or accuracy over existing tools. It also demonstrates the power of pooled single cell studies, including those involving composite cell lines, to discover new and interesting biology.
Back-mutations to the ancestral state at this type are uncommon, at a frequency typically seen in mitochondrial protein-coding or disease-implicated mutations. This suggests that this mutation may be one of the changes affecting gene regulation at this locus.
The affected locus (MT-ND3/MT-ND4L) was found by others (bmcbiol.biomedcentral.com/articles/10....) to be cleaved by an unknown mechanism at a site that we noticed is next to a fixed, derived human-specific mutation that might affect cleavage rates by altering the 3D shape of the RNA.
Mitochondrial genes are expressed as polycistronic transcripts, then cleaved and selectively degraded. We looked at species differences in this process, from two causes: nuclear and mitochondrial mutations. Interestingly, the biggest differences we found were compensatory, with little net effect.
By finding one fusion line that tended to retain both species’ mitochondria, we were able to hone in on the gene network involved in this process: we can see what was turned up in the unhealthy cells, and what was turned down in those that survived.
We think this means incompatibility between allospecific mitochondria that causes gene dysregulation, as well as a nuclear self-destruct mechanism. Interestingly, a prior study also found that human cells have “a suicidal preference for self-mtDNA”: www.molbiolcell.org/doi/10.1091/...
Cells with two species’ mitochondria have significantly altered gene expression related to cell cycle arrest and apoptosis relative to other cells, suggesting they’re in trouble. They also express fewer mitochondrial transcripts overall and have abnormal post-expression transcriptional regulation.
After demultiplexing with CellBouncer, we found that composite cells mostly inherit only one species’ mitochondria: human, for human/chimpanzee cells, and bonobo, for chimpanzee/bonobo cells. Not always, though: some cells retained both mitochondria, or those from the less common species.
We take CellBouncer for a spin on a cool data set: inter-species composite iPSCs we created by cell fusion (www.nature.com/articles/s41...) for studying species differences in gene regulation. Here, we asked if there were biases in which species’ mitochondria were inherited by the composite cells.
doublet_dragon takes assignments from the other programs and infers a global doublet rate that encompasses both homotypic doublets (invisible to individual programs) and heterotypic ones. This can help with QC (given expectation based on cell loading density) and serve as a prior for other tools.
demux_tags assigns custom labels (e.g. MULTIseq/HTO data), or sgRNAs (CRISPR guide capture data) to cells. Our method considers the distribution of all tag counts together, rather than considering each tag independently, and handles noisy/low-count data better than some alternatives.
bulkprops takes genotypes and bulk data (or single cell data, ignoring cell barcodes) and infers the proportion of each individual in the pool. This can cross-check the other programs, and we provide a method to bootstrap proportions and get p-values when comparing two sets of proportions.
Additionally, quant_contam models the genotypic origins of ambient RNA, meaning it can highlight when specific donors or cell lines contribute disproportionately to ambient RNA. If expression data are provided, quant_contam can adjust counts to account for contamination.
quant_contam quantifies ambient RNA by measuring how often cells mismatch their expected genotypes. This introduces an external ground truth (genotype data), avoids the need to consider empty droplets, and can find ambient RNA in data lacking cell type diversity.
After running demux_mt, we suggest a pipeline that can produce a VCF file of nuclear variants and demultiplex more cells using demux_vcf. While not suited to every data set, we demonstrate this method on whole-cell RNA-seq and single nucleus ATAC data, outperforming competing methods.
demux_mt answers this problem by simultaneously clustering mitochondrial haplotypes and inferring the number of individuals in the pool. It takes only a BAM file. There is also a way to plot the haplotypes to see how well clustering worked.