cxqiu (@cxqiu) Bsky - nopzon.com

Excited to share our new paper @nature.com! We developed PerturbFate, a scalable single-cell platform to discover how diverse genetic perturbations converge on a shared drug-resistant cell state, and key programs driving it, led by our incredible Zihan Xu from @rockefeller.edu
rdcu.be/fdC14

5 days ago 8 4 1 0

CREsted is finally out! You can find the article, together with a summarizing Research Briefing, in thread. 🦎

1 week ago 27 14 1 1

Latest from Shendure & Qiu labs (@cxqiu.bsky.social)
)! We combined a new 4M cell mouse whole embryo scATAC-seq atlas (E10-P0), millions of 'evolutionarily coherent' orthologs from 241 mammalian genomes (Zoonomia), and the CREsted CNN framework (@steinaerts.bsky.social).

1 week ago 39 16 1 0

We thank Q-T-π (Canis familiaris), Tater and Tot (Feline catus) for inspiration. Nothing in biology makes sense except in the light of evolution — apparently including AI.

1 week ago 3 0 0 0

Huge team effort led by CX Qiu, Riza Daza, and Ian Welsh. Jay Shendure supervised the project. Key contributions from Rupali Patwardhan, @niklaskemp.bsky.social & @steinaerts.bsky.social (CREsted), built on our mouse timelapse with @coletrapnell.bsky.social. Grateful to the whole team and Zoonomia.

1 week ago 4 0 1 0

Everything is open: interactive preprint, count matrices, models, all 7,712 prediction tracks, code & reproducible figures → doi.org/10.62329/hxkk6249. Raw data: GEO GSE325776. Code: github.com/ChengxiangQiu/jax-atac-code

1 week ago 2 1 1 0

Limitations we're upfront about: promoter suppression is still heuristic, some species bias remains, and all labels derive from mouse. Matched atlases in a few more species would help a lot. This is v1 — substantial headroom remains.

1 week ago 1 0 1 0

Model organisms aren't just for cataloging biology — they're training substrates for AI models of human biology. Mouse experimental depth + mammalian sequence diversity = virtual access to human regulatory landscapes we can't profile directly.

1 week ago 2 0 1 0

We applied STEAM to all 241 Zoonomia genomes: 32 × 241 = 7,712 genome-wide enhancer tracks. HumMus for human + mouse. BabaGanoush for the full spread!

1 week ago 2 0 1 0

Some favorites — human enhancers with no mouse ortholog, validated by fetal accessibility:
> FECH intron 1 (erythroid, heme biosynthesis)
> upstream of TFRC (erythroid, iron uptake)
> upstream of APOB (hepatocyte, LDL cholesterol)
> upstream of CYP2C19 (hepatocyte, drug metabolism)

1 week ago 2 0 1 0

Even for human enhancers with NO mouse ortholog at all, STEAM predicts the right cell class. Hepatocyte-predicted elements are more accessible in human hepatoblasts; erythroid-predicted elements in erythroblasts. 7× difference in the expected direction.

1 week ago 2 0 1 0

Key validation: human-only predicted enhancers are 8–9× more accessible than mouse-only predictions in the corresponding human fetal cell type — using Domcke et al. human fetal accessibility data the model never saw. Evolutionary transfer learning works.

1 week ago 2 0 1 0

We apply STEAM genome-wide to human + mouse: ~340K enhancers per species across 32 cell classes. Jaccard co-occurrence of enhancer predictions recovers nearly identical lineage structure in both species — regulatory logic is deeply shared.

1 week ago 2 0 1 0

STEAM resolves 11 synteny groups of hepatocyte enhancers at this locus — orthologous enhancer families with shared ancestry but divergent sequences. Some are deeply conserved, others lineage-restricted (e.g. one group found only in Old World monkeys).

1 week ago 2 0 1 0

The payoff: at the Afp locus, hepatocyte enhancer predictions jump from 1.2/species to 4.6/species across 136 mammals. Signal-to-noise goes from 3× to 15×. The Mus-restricted bias largely disappears.

1 week ago 2 0 1 0

Performance scales with phylogenetic breadth, plateauing ~32 species — but even partial inclusion accelerates convergence dramatically.

1 week ago 3 0 1 0

Enter STEAM: we augment training with syntenic enhancer orthologs from up to 241 Zoonomia genomes — a ~200× expansion in sequence diversity, preserving cell-class labels. Orthologous enhancers are nature's data augmentation: divergent sequences, shared function.

1 week ago 2 0 1 0

But the evolution-aware model doesn't transfer across species. At the Afp locus, hepatocyte predictions light up in Mus and go dark elsewhere. The culprit: insufficient sequence diversity from training on one genome.

1 week ago 3 0 1 0

These predicted enhancers inform gene expression: stronger enhancer predictions near a gene → higher cell-class-specific expression, with clear distance-dependent decay. This holds across all 32 developmental lineages.

1 week ago 3 0 1 0

Compare the Alb/Afp locus: the evolution-naive model predicts promoters, tandem-repeat artifacts, and real enhancers. The evolution-aware model distills this to six clean hepatocyte-specific elements, trimmed to core regions of 154–474 bp.

1 week ago 2 0 1 0

Filtering yields 32 cell-class-specific enhancer clusters that map one-to-one onto developmental lineages, plus one large promoter cluster — cleanly separated. The evolution-aware model trained on these eliminates both failure modes.

1 week ago 2 0 1 0

Step three: use evolution to clean up. Real enhancers should be syntenically retained across mammals AND show coherent predicted activity across orthologs. Both filters yield clean bimodal distributions — nature's quality control.

1 week ago 5 0 1 0

But when we tile the WHOLE genome? Two failure modes: tandem repeats generate massive false positives (24× enrichment), and promoter grammar contaminates distal enhancer predictions. Strong performance on peaks ≠ reliable genome-wide inference.

1 week ago 2 0 1 0

Step two: train CREsted to predict cell-class-specific accessibility from DNA sequence. Strong performance on held-out peaks (r = 0.74), with lineage structure clearly recovered.

1 week ago 2 1 1 0

The atlas integrates tightly with our matched scRNA-seq timelapse (11M cells, same embryo cohort). Nearest neighbors in the co-embedding land right on matching timepoints.

1 week ago 2 0 1 0

Step one: the atlas. 3.9M nuclei by sci-ATAC-seq3 from 36 whole mouse embryos, one per 6-hour bin, E10 to P0. No dissection. We resolve 13 lineages → 36 cell classes → 140 cell types across the full arc of organogenesis.

1 week ago 2 1 1 0

The core idea: cis-regulatory sequences evolve fast, but the trans-acting programs that read them evolve slowly. This mismatch — the same principle that powered AlphaFold — means models trained on mouse enhancers should generalize to orthologous cell types across Mammalia.

1 week ago 2 0 1 0

New preprint @cxqiu.bsky.social @jshendure.bsky.social ! Can we learn regulatory grammars of human cell types — by training on mouse development and transferring across 241 mammalian genomes? Introducing STEAM & a whole-organism scATAC-seq atlas from E10 to birth.
www.biorxiv.org/content/10.6...

1 week ago 48 26 1 2

Grateful to my mentors @jshendure.bsky.social, @coletrapnell.bsky.social, @cbmoens.bsky.social, @wnoble.bsky.social, Bob Waterston, @ksusztak.bsky.social , and Qinghua Cui for their guidance and support along the way 🙏 !!

5 months ago 1 0 0 0

Thrilled to share I’ve started my lab at Dartmouth’s Geisel School of Medicine! We focus on mapping cellular trajectories & TF networks in development and Mendelian disorders, exploring new therapies. Join us—postdocs, grads, and scientists welcome! sites.dartmouth.edu/qiulab/

5 months ago 16 10 2 0

Posts by cxqiu