Read the full preprint below 👇 If you're interested in the interface of bioengineering, DNA and virtual cell foundation models, and agentic reasoning, shoot me a note. We're hiring postdocs and ML researchers and starting some crazy new projects
www.biorxiv.org/content/10.1...
Posts by
This work was a wonderful collaboration with Silvana Konermann, led by star graduate student Nick Perry with key contributions from the amazing Liam Bartie, Dhruva Katrekar, Gabe Gonzalez, Matt Durrant, James Pai, Alison Fanton, Masa Hiraizumi, Chiara Ricci-Tam, and Hiroshi Nishimasu
Arc is on 🔥
Bridge recombinases can modify the genome from single gene insertions to megabase-sized rearrangements
We're excited about programmable genome design at unprecedented length scales, especially when combined with AI-generated DNA sequences of high complexity (e.g. Evo 2)
Most people think of recombinases for payload insertion (e.g. of CARs or corrective genes)
We provide a therapeutic proof-of-concept with bridge-mediated excision of the BCL11A enhancer for sickle cell anemia and of expanded repeat sequences found in Friedreich's ataxia
But unlike other tools, bridge editing is not limited to insertion! We use IS622 for programmable, precise, and scarless genome rearrangements, inverting up to 0.93 Mb and excising up to 0.13 Mb
We then performed a systematic deep mutational scan of IS622 and combined a rationally engineered, high activity recombinase mutant with our enhanced bridge RNAs to demonstrate 20% insertion efficiency into the human genome
Using these enhanced bridge RNAs, we discovered design principles for maximizing the specificity of insertion into the human genome, achieving as high as 82% specificity genome-wide
In a tour de force of molecular engineering, our team conducted computational ortholog mining, human cell activity screening, and structure-guided bridge RNA engineering to enhance the activity of IS622, a bridge system that showed promising but low activity in human cells
Bridge recombination systems are elegant molecular tools that utilize a recombinase enzyme and a programmable bridge RNA to "bridge" and recombine two distinct DNA molecules
This is a universal mechanism for insertion, excision, or inversion of any two DNA sequences
Genomes encode biological complexity, which is determined by combinations of DNA mutations across millions of bases
In new work @arcinstitute.org, we report the discovery and engineering of the first programmable DNA recombinases capable of megabase-scale human genome rearrangement
This was a fun one — new Endpoints Slack interview with @pdhsu.bsky.social :
endpts.com/endpoints-sl...
are there good CROs for cell line engineering and generation?
Today, we're launching the Arc Virtual Cell Atlas, a growing resource for computation-ready single-cell measurements. arc-website-git-ben-virtual-cell-atlas-tool-arc-institute.vercel.app/news/news/ar...
At the @arcinstitute.org we are building AI models of cell state from the ground up, rethinking every step, from data generation to biologically relevant evaluation
Today we launch scBaseCamp, the largest public repository of single cell RNAseq data, uniformly processed from raw sequencing reads.
@thejohnnyyu.bsky.social, @therealnima.bsky.social, and I, are excited to tell you about Tahoe-100M! The largest publicly available single-cell dataset that measures the effect of 1200 genes on 50 cell line models. The Vevo team has outdone itself. #Tahoe100M www.biorxiv.org/content/10.1...
Watch @thejohnnyyu.bsky.social @therealnima.bsky.social (@vevotherapeutics.bsky.social), @pdhsu.bsky.social , Dave Burke and I (@arcinstitute.org) talking about virtual cells, and how #Tahoe100M, now on. @arcinstitute.org's Virtual Cell Atlas, can change the game!
www.youtube.com/watch?v=ak_f...
New from @arcinstitute.org is "the largest publicly available #AI model for biology to date"!
Evo 2 now includes information from all domains in life to expand its capabilities in generative functional genomics. @pdhsu.bsky.social @brianhie.bsky.social
tinyurl.com/3t83vseh
This was an insane team effort between Arc and Nvidia that convened machine learning and computational biology researchers across Stanford, UC Berkeley, and UCSF. Especially grateful to Jensen Huang for his belief and support of this vision and labor of love, and the entire Evo 2 team below
Finally, if Evo 2 sounded exciting, @arcinstitute.org
is hiring. Check out open Arc jobs at arcinstitute.org/jobs or just email me directly. Our research group is hiring in molecular machine learning and the interface of computational and synthetic biology
and a few more :)
NVIDIA BioNeMo: github.com/NVIDIA/bione...
NVIDIA NIM (Generation): build.nvidia.com/nvidia/evo2-...
NVIDIA NIM (Forward): build.nvidia.com/arc/evo2-40b
HuggingFace Evo 2 40B: huggingface.co/arcinstitute...
HuggingFace Evo 2 7B: huggingface.co/arcinstitute...
Here are some useful links:
Evo 2 preprint: arcinstitute.org/manuscripts/...
Evo Designer: arcinstitute.org/tools/evo/ev...
Evo Mech Interp Visualizer: arcinstitute.org/tools/evo/ev...
Evo 2 code: github.com/arcinstitute...
DNA is just the beginning. In middle school, we learn that genotype and the environment collaborate to create phenotype. We are incorporating Evo 2's understanding of genetic variation into Arc's virtual cell models that can be used for drug discovery and target ID
We're excited to see what the research community builds on top of this foundation model to enable the biological "app store"
Beyond pretraining scale, Evo 2 also scales at inference time. We demonstrate "generative epigenomics" by controlling the position and width of predicted chromatin accessibility to encode Morse code messages in the epigenome. Can you guess what's written below? .- .-. -.-.
Evo 2 can also be used for biological design. We demonstrate generation of entire human mitochondrial genomes with coherent synteny and even whole bacterial genomes and eukaryotic chromosomes (see the preprint for more detail)
A common critique of LLMs is that they're black box. To probe what Evo 2 is learning about biology (without any labels or annotations), we turned to mechanistic interpretability with Goodfire AI
Intriguingly, this AI brain has features that may correspond to regulatory elements
With a simple supervised model trained on Evo 2 embeddings, its performance gets even better, reaching SOTA for coding mutations also
Without any variant-specific training, architectural optimization, or multiple sequence alignments, Evo 2 can predict the pathogenicity of breast cancer-associated mutations in genes like BRCA1
It's state of the art in doing this zero-shot for noncoding mutations
Great, but what can it do? Evo 2 is a generalist model that can predict the pathogenic effects of human genome variants across coding and noncoding mutations
In other words, if you have a genetic mutation, Evo 2 has an opinion on whether or not it might cause disease