I'm reviving this academic job board and will keep maintaining it. Welcome to share with your friends on the job market. And any suggestions are welcome! github.com/brianpenghe/...
Posts by Peng He 何鹏
📢New Perspective out! @penghe.bsky.social, @teichlab.bsky.social and colleagues review widely adopted batch correction methods and propose a path toward more informed, context-aware approaches for future method development. www.nature.com/articles/s43... 🖥️ 🧬
🔓 rdcu.be/e4pUJ
This paper wouldn't be possible without my wonderful co-authors: Shuang Li, Malte Lücken, and the support from the Marioni Lab and Teichmann Lab. 🤝
Huge thanks to the reviewers and community for the feedback! #SingleCell #Bioinformatics #scRNAseq #DataScience
8️⃣ The Future: Leverage high-quality references and interpretable models to convert "unknown" Class II effects into "known" Class I artifacts. 🔄
Long-term goal 🚀: Robust reference-based analyses resilient to batch effects—ultimately reducing the need for post hoc correction altogether.
7️⃣ Despite progress, gaps remain: ❌ "Black box": models often obscure which genes are corrected. ❌ Inefficiency: We often train models "from scratch" for every dataset. ❌ Lack of guidance on overlapping/redundant covariates.
6️⃣ Notably, while the exact link between genes and technical variance remains complex, emerging feature selection insights may pave the way for interpretable frameworks that explicitly model these sources rather than blindly correcting them.
5️⃣ We further divide each category into three subgroups.
⚙️ Data Cleaning methods relate to physical models, offering mechanistic insights and precise analysis. ⬛ Data Integration methods are highly effective but implicit, often functioning as "black boxes" where adjustments are harder to trace.
4️⃣ However, there is a long history of method development handling Class I variations (known artifacts like sequencing depth). 🧹
We classify these as "Data Cleaning" methods. They explicitly model technical noise sources—a critical step often overshadowed by integration.
3️⃣ It’s important to clarify that what is usually called "data integration" or "batch correction" (like MNN, Harmony, or scVI) is actually just one subset of methods.
These tools are typically designed for Class II effects—variations that are complex or batch-specific.
2️⃣ We classify batch effects into two categories (Fig 1a):
🔹 Class I: Better characterized, universally unwanted artifacts (e.g., ambient RNA, sequencing depth). 🔹 Class II: Poorly characterized, batch-specific variation (e.g., donor effects, protocol differences).
1️⃣ Batch effects remain the "elephant in the room" for single-cell genomics. 🐘
The core challenge? The trade-off between undercorrection (residual noise) and overcorrection (erasing fine-grained biological signals). We argue that not all batch effects are the same.
Our Perspective paper "Toward informed batch correction for single-cell transcriptome integration" is out now in Nature Computational Science! 📄✨
We review a decade of batch-correction methods and propose a move from "blind" integration to "informed" modeling. 🧵👇 🔗 rdcu.be/e4cSp
give it a try and you may get addicted : )
Being a new PI 🔽 youtube.com/shorts/llV5_...
🔬 Hiring: Computational Biology Postdoc @UCSF!
to develop:
1️⃣ Novel deep learning models for spatial/single-cell multiomics
2️⃣ Single-cell analyses across development & disease
3️⃣ Open-source tools for the broader community
Apply here: opportunities.ucsf.edu/content/open...
#CompBio #PostdocJobs
🎉Our new lab website is live! 🎉
Explore our research on single-cell🌃, multi-omics🔮, spatial-omics🗺️, gene regulation🧬, bioinformatics🤖, and development🌲. Meet our motivated team and stay tuned for our latest works 🚀🌟New members at all levels welcome!
peng-he-lab.github.io
He Lab is looking for a computational research assistant to help us build the best cell atlases in the world and dissect gene regulatory networks. This position can be useful for fresh grads to take a break before applying to grad school. More details: aprecruit.ucsf.edu/JPF05334
Smart design. I do have codes to pool the replicates, demultiplex together, and then separate the replicates, in case you would need that
Ive done 6 with uneven mixing. Guess 10+ is possibow with even mixing+30k cells per library
I'm maintaining an active list of biology-related jobs at various levels (PIs, staff scientists, postdocs, PhDs, etc.). Please feel free to subscribe or contribute! github.com/brianpenghe/...