Can we simulate realistic evolutionary trajectories and “replay the tape of life”? In this work, we propose a flexible, generalizable deep learning framework for modeling how the entire protein sequence evolves over time while capturing complex interactions across sites. 1/n
doi.org/10.64898/202...
Posts by Koushik
@jengreitz.bsky.social l & my lab want to co-hire a computational biologist/biostatistician with project management expertise to help map the regulatory code of the human genome and discover genetic mechanisms of disease.
Details below
careersearch.stanford.edu/jobs/computa...
Plz RT
In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known.
Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.
please add me too. I work on ML for Plant Biology
An assessment of DNA language models concludes:
◼️ They do not offer compelling gains over baseline models
Their performance is inconsistent and requires much more compute.
arxiv.org/abs/2412.05430
Our structural core gene pipeline Unicode is now published at GBE
📄 doi.org/10.1093/gbe/...
Please also check out @dongwookkim.bsky.social’s
🧵 bsky.app/profile/dong...
"A cacao tree with fruit pods in various stages of ripening. Taken on the Big Island (Hawaii) in the botanical gardens." "Chocolate is created from the cocoa bean. A cacao tree with fruit pods in various stages of ripening." Photo by Medicaster, Wikimedia
The only reason you love chocolate is because of FUNGUS.
Cacao seeds contain high amounts of polyphenols, making them intensely bitter & unpleasant. There are two natural fungi that do the heavy lifting in turning them into chocolate.
Let's do a quick tour of the process of chocolate making.
Three BioML starter packs now!
Pack 1: go.bsky.app/2VWBcCd
Pack 2: go.bsky.app/Bw84Hmc
Pack 3: go.bsky.app/NAKYUok
DM if you want to be included (or nominate people who should be!)
AFESM: a metagenomic guide through the protein structure universe! We clustered 821M structures (AFDB&ESMatlas) into 5.12M groups; revealing biome-specific groups, only 1 new fold even after AlphaFold2 re-prediction & many novel domain combos. 🧵
🌐 afesm.foldseek.com
📄 www.biorxiv.org/content/10.1...
Super excited to share our review on genomic deep learning models for non-coding variant effect prediction, with Ayesha Bajwa and Nilah Ioannidis. We’d like this review to be a useful resource, and welcome any feedback, comments, or questions! 1/4
arxiv.org/abs/2411.11158
Overview of SAE methodology and representative SAE features revealed through automated activation pattern analysis
Using mechanistic interpretability to steer generations
SAE feature analysis and visualizations reveal features with diverse and consistent activation patterns
Mechanistic interpretability on a protein language model
www.biorxiv.org/content/10.1...
Two BioML starter packs now:
Pack 1: go.bsky.app/2VWBcCd
Pack 2: go.bsky.app/Bw84Hmc
DM if you want to be included (or nominate people who should be!)
DEGU distills an ensemble of models into a single model, retaining the ensemble’s predictive performance while providing uncertainty estimates - ie both epistemic (or model) and aleatoric (or data) uncertainty.
Led by @zrcjessica
Paper: www.biorxiv.org/content/10.1...
2/n
Large protein language models can learn complex epistatic interactions, but how much does that help with predicting variant effects? In this NeurIPS article, we show that classical independent-sites phylogenetic models can outperform pLMs on this task.
1/7
openreview.net/forum?id=H7m...
Thrilled to announce Boltz-1, the first open-source and commercially available model to achieve AlphaFold3-level accuracy on biomolecular structure prediction! An exciting collaboration with Jeremy, Saro, and an amazing team at MIT and Genesis Therapeutics. A thread!
I tried to make a bioml starter pack. DM if you want me to add or remove you?
go.bsky.app/2VWBcCd