Great work Žiga! Bit of a technical Q - "Predictions are adapted to the specific organism (human or mouse) by incorporating learned, organism-specific embeddings within these functions" - how were these embeddings learnt? During AlphaGenome training? Also how are they incorporated
Posts by Alan Murphy
Excited to launch our AlphaGenome API goo.gle/3ZPUeFX along with the preprint goo.gle/45AkUyc describing and evaluating our latest DNA sequence model powering the API. Looking forward to seeing how scientists use it! @googledeepmind
Just released tangermeme v0.5.0!
tangermeme implements "everything-but-the-model" for genomic ML Essentially, train your model your way using your code-base (or load someone else's model), and tangermeme handles the discovery + design with it.
Try it out with `pip install tangermeme`.
We're thrilled to introduce PromoterAI — a tool for accurately identifying promoter variants that impact gene expression. 🧵 (1/)
Our preprint on designing and editing cis-regulatory elements using Ledidi is out! Ledidi turns *any* ML model (or set of models) into a designer of edits to DNA sequences that induce desired characteristics.
Preprint: www.biorxiv.org/content/10.1...
GitHub: github.com/jmschrei/led...
Completely agree these benchmarks are necessary! This is something we benchmarked against and in some settings sometimes only *just* bet even using enformer as a pretrained model when predicting epigenetic signals across cell types www.nature.com/articles/s41...
[SAVE THE DATE] MLCB 2025 is happening Sept 10-11 at the NY Genome Center in NYC!
Attend the premier conference at the intersection of ML & Bio, share your research and make lasting connections!
Submission deadline: June 1
More details: mlcb.github.io
Help spread the word—please RT! #MLCB2025
Great work! Did you look into how well hashFrag scales with large input windows (approaching Enformer/Borzoi receptive fields)? I'm guessing the MPRA data used in the paper must be ~200 bps?
Super excited to announce our latest flagship model Borzoi: major props to Johannes & David Kelley et al for advancing it. It's been a long journey from our prior Enformer model into this one. A few innovations: i) longer DNA context, ii) adaptation to predict RNA-seq abundance and splice isoforms,
I'm hiring:
1. Research associate (wet-lab w/ phd) to generate mpra perturbation data
2. ML postdoc to build multimodal generative AI for DNA (eg diffusion and LLMs)
3. Bioinformatician (any level) to process and harmonize functional genomics data to train foundation models
DM me if interested!
Figure 1 from the preprint, showing a schematic of the rat rotenone exposure experiments and the assays H3K27ac ChIP-seq and RNA-seq. It also shows the top altered genes from the ChIP-seq analysis in the substantia nigra and cortex in volcano plots.
Just in time for the holidays, we are thrilled to give you the latest preprint from our lab:
Unique nigral and cortical pathways implicated by epigenomic and transcriptional analyses in a rotenone rat model of Parkinson's disease
doi.org/10.1101/2024...
In a big life update, I successfully defended my PhD thesis - massive thanks to my PI Nathan and assessors @steinaerts.bsky.social @proftomellis.bsky.social . Thrilled to share that I will be joining @pkoo562.bsky.social at CSHL in the new year for a post-doc improving genomic deep learning models!
🚨 We’re hiring! 🚨
The Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory is looking for a Research Technician to join our team.
If you’re passionate about genomics, AI, and experimental science, we want to hear from you.
Help us spread the word!
jobrxiv.org/job/cold-spr...
So excited that our work on predicting gene expression from histone modifications using deep learning is out in NAR today. Brilliant to work with lead author @al-murphy.bsky.social and collaborators Aydan Askarova, @borislenhard.bsky.social and Nathan Skene 🧬⭐️🙏
academic.oup.com/nar/advance-...
(1/10) Excited to announce our latest work! @arpita-s.bsky.social, @amanpatel100.bsky.social , and I will be presenting DART-Eval, a rigorous suite of evals for DNA Language Models on transcriptional regulatory DNA at #NeurIPS2024. Check it out! arxiv.org/abs/2412.05430
My goal is to understand the regulatory role of every nucleotide in the genome, and how this changes across every cell in the human body.
If you are interested in doing a Ph.D. with me at UMass Chan Medical (Genomics and Comp Bio Department), see the links below. Deadline is Dec 1st.
Massive thanks to all co-authors for their work on this William Beardall, @marekrei.bsky.social, Mike Phuycharoen and Nathan Skene.
Enformer Celltyping’s predictions capture cell type-specific genetic enrichment for complex traits - a Heatmap of stratified LD score regression (s-LDSC)73 analysis for genetic variants associated with brain and immune diseases/traits and behavioural traits (sourced from associated GWAS) displayed as false discovery rate (FDR) value for significance of enrichment for ATAC-Seq chromatin accessibility signal, H3K27ac signal and Enformer Celltyping’s (EC) predictions of H3K27ac for microglia, neurons and oligodendrocytes (oligoden.). b -log10(FDR) genetic enrichment for the complex traits from s-LDSC. c Proportion of peaks in derived peak files used for s-LDSC analysis. The median, minima and maxima foe the violin plots were Monocyte; 0.768, 0.380, 0.934, Neutrophil; 0.720, 0.475, 0.925 and T-Cell; 0.737, 0.159, 0.924.
A key finding was the current limitations of such models at genetic variant effect prediction - the same as others have found, like Ioannidis & Mostafavi labs. Despite this, Enformer Celltyping can also be used to study cell type-specific genetic enrichment of complex traits.
Delighted to share our work to develop a genomic DNN, Enformer Celltyping, to accurately predict epigenetic signals in previously unseen cell types has now been published doi.org/10.1038/s414...
Extending pretrained LM-inspired architectures for genome modeling and releasing a tool for predicting epigenetic signals while being cell type-agnostic. Happy to be a co-author on this excellent paper by @Al_Murphy_ , now in Nature Communications. www.nature.com/articles/s41...
Jessica Zhou (@zrcjessica) is a talented postdoc now on the job market looking for ML/data science industry positions in the NY area! If you have an open position reach out!
Please Repost to spread the word! 🙏🏻
🧬 Genomic DNNs can be trained to learn a lot of different aspects of gene regulation, but they're not perfect and we don't know which predictions are reliable and which ones aren't.
We introduce DEGU: Uncertainty-aware Genomic Deep Learning with Knowledge Distillation. 1/n