The premier conference on Machine Learning for Computational Biology is Sep 9-10 at the NY Genome Center in NYC!
Submission deadline is June 1 for 2-page abstracts and 8-page papers (eligible for proceedings track).
Registration is now open! (Link below)
Please retweet!
Posts by Aaron Wenteler
13/13 Thanks to all the amazing collaborators: Martina Occhetta, Nik Branson, Magdalena Huebner, Victor Curean, Will Dee, Will Connell, Alex Hawkins-Hooker, Pui Chung, Yasha Ektefaie, Amaya Gallagher-Syed (@amayags.bsky.social) and César Córdova.
12/13 We plan to maintain and expand PertEval, creating a comprehensive benchmarking suite for the research community. Community contributions are very much encouraged!
Paper 📃: www.biorxiv.org/content/10.1...
GitHub 💻: github.com/aaronwtr/Per...
11/13 Looking ahead, we believe progress in this field will specifically require two key elements:
- Higher-quality data spanning a wider range of cellular states and perturbations
- Specialized models designed to fully leverage large-scale datasets for perturbation prediction
10/13 These findings highlight important challenges in using scFMs for perturbation effect prediction. While scFMs have potential, our results suggest that current models aren't yet optimized for this specific task.
9/13 Our analysis revealed that all models struggle to predict strong or atypically distributed perturbations and mostly learn average perturbation effects in a zero-shot setting. This highlights the need for training data that better represents cellular states and responses to perturbations.
8/13 Many perturbation prediction evaluations use 2,000 HVGs, while most genes don't show a strong response. However, even when narrowing down to the top 20 DEGs per perturbation, some scFM embeddings only slightly outperformed the baseline methods, while others still didn't.
7/13 We found that current-generation zero-shot scFM embeddings showed no significant improvement over task-specific model GEARS or even over simple baselines when predicting perturbation effects across 2,000 highly variable genes (HVGs).
6/13 On top of this, our framework also considers distribution shift, a frequently overlooked factor. We applied PertEval to evaluate zero-shot embeddings from several scFMs: scBERT, Geneformer, scGPT, scFoundation, and UCE.
5/13 PertEval-scFM includes three metrics:
- Area Under the SPECTRA Performance Curve (AUSPC)
- E-distance
- Pre-train / fine-tune cosine similarity (contextual alignment)
Each metric provides unique insights into model behaviour and robustness.
4/13 Our framework introduces a standardized toolkit of metrics designed to provide a nuanced evaluation of perturbation effect prediction model performance. Such a framework facilitates meaningful comparisons across different approaches and datasets.
3/13 Currently, there's no agreement on how to compare different approaches for perturbation effect prediction. This makes it challenging to determine which models truly perform best, or to identify areas for improvement. PertEval-scFM aims to change that.
2/13 With the rapid rise of models and scFMs for this task, it's more important than ever to have standardized evaluation methods. PertEval-scFM provides a comprehensive framework to assess these AI models in predicting cellular responses to genetic perturbations.
1/13 Excited to share that PertEval-scFM got accepted into ICML 2025 🇨🇦!
We provide benchmark and evaluation tools for perturbation effect prediction models, including single-cell foundation models (scFMs). Paper and GitHub link at the end of the thread! 🧵👇
Thank you for your great work. We have a copy here at the office proudly on display 😎
Am I the only one who feels GPT 4.5 is actually worse than 4o? Its prompt adherence seems to be bad in my experience. Thought that might be beneficial for creative tasks, but even there, I feel like the outputs it generates are underwhelming compared to 4o
🔥Our paper "BioX-CPath: Biologically-driven explainable Diagnostics for Multistain IHC omputational Pathology" was accepted at #CVPR2025!
🚀We'll be releasing the paper and code repo ASAP, stay tuned! #multistain #IHC #pathology #ExplainableAI #GNNs #PrecisionMedicine #Immunolog
Announcing Evo 2: The largest publicly available, AI model for biology to date, capable of understanding and designing genetic code across all three domains of life. t.co/1Zt6gQ74SA
Attended an incredible talk by @philipcball.bsky.social in Oxford. He covered a big chunk of genetics and molecular biology at lightning speed without compromising on clarity. He also convincingly explained why the central dogma is outdated. Go check out his latest book, How Life Works
It was a pleasure talking about our recent single-cell foundation model benchmark, PertEval-scFM, at the Multiomics Reading Group at Mila. Many thanks to the organizers for the invitation and to @valenceai.bsky.social for sharing the talk. Check it out here: youtu.be/DCezfwQkkAE?...
Excited to attend this, looking forward to it!
Genes & Health is excited to contribute 55,000 high-quality exomes from British South Asian volunteers to gnomAD, a global genetic resource. This open-access data will advance rare disease diagnosis and treatment, thanks to our amazing volunteers. #GenesAndHealth #Genomics #RareDiseases
A month ago we @vevotherapeutics.bsky.social announced that we have generated the largest single-cell perturbation atlas in history, Tahoe-100M. Today, we announce that we will fully open-source Tahoe-100M in Feb, as part of a collaboration with NVidia health to train cell state models.
How can we build an Al virtual cell that simulates all functions and interactions of a cell? How will it transform research and drive breakthroughs in programmable biology, drug discovery and personalized medicine?
Take a look at our paper in @cellpress.bsky.social!
www.cell.com/cell/fulltex...
My first time submitting to a big ML conference. Very frustrating experience after having worked really hard to address all the reviewers’ concerns only to be met with silence once we completed and shared the results. Hoping the meta-reviews will be better
Where can I find a comprehensive and reliable resource for protein family annotations based on a gene name? I’ve explored Pfam / InterPro, but the annotations seem inconsistent or incomplete. Are there other tools or databases that provide more comprehensive or reliable annotations?
I tried looking for it but wasn’t able to find it. Thank you!
Nice to meet you Pat!
Any people here into bio x ML? #multiomics #AI #ML #genomics #proteomics #drugdiscovery