@lorenzopantolini.bsky.social and I are headed to @iclr-conf.bsky.social at Rio soon, with talks about this work at @gembioworkshop.bsky.social and LMRL workshops. Reach out to chat about representation learning for de novo protein design! 🫖
Posts by Janani Durairaj (Jay)
ROCKET 🚀 inference-time optimization of AlphaFold to fit structural data is published! rdcu.be/fa9YH
Since our preprint, we’ve pushed it to regimes where other methods break: low resolution, weak signal, real experimental edge cases. Here’s what we learned: 1/15
Very happy to have had a chance to attack an initially very low-resolution #cryo-EM map with #ROCKET! Thank you again @alisiafadini.bsky.social and all other co-authors of this important work, which truly shows the power of combining experimental structural biology and #AI inference.
rdcu.be/fa9YH
ROCKET enables model building of a ZPD filament from low-resolution cryo-EM
Starting from an #AlphaFold-Multimer prediction, we used #ROCKET to build a model of ZPD, a homopolymeric zona pellucida (#ZP) protein, into an initial #cryo-EM map at only ~9 Å resolution. A subsequently obtained 4.6 Å map highlighted how superior the ROCKET model was over the initial prediction:
Stoic 🦾 from our shared student @daniil-litvinov.bsky.social predicts protein complex stoichiometry. A fun collab with @ninjani.bsky.social @torstenschwede.bsky.social - this #AI adventure beyond our core #CryoET methods was made possible by the @biozentrum.unibas.ch PhD Fellowship Program! 🧪 🧶🧬
Check out this awesome work from @daniil-litvinov.bsky.social: Protein complex stoichiometry prediction (both homomers and heteromers) from sequence, with some nice ablations showing what makes the difference!
New OpenFold3 preview out! (OF3p2)
It closes the gap to AlphaFold3 for most modalities.
Most critically, we're releasing everything, including training sets & configs, making OF3p2 the only current AF3-based model that is functionally trainable & reproducible from scratch🧵1/9
The Critical Assessment of Structure Prediction (CASP) experiment is calling for prediction targets: Immune Complexes, Organic Ligand-Protein Complexes, Nucleic Acids and Complexes, Conformational Ensembles, Difficult Protein Structures and Complexes. Rule of Thumb: If AlphaFold3 can generate a high-quality model, it is likely not a CASP-grade challenge. If it struggles, we want it.
Is #AI hitting a plateau in structure prediction? Help us find out at CASP17! 🧪🧬
Calling for Targets: Immune Complexes, protein - ligand complexes, RNA/DNA, conformational ensembles, membrane proteins, viral origins, and large complexes.
The Rule of Thumb: If AF3 can’t model it, we want it.
Five years ago, we released FLIP. The core question was: can ML models for protein fitness prediction generalize in the ways that actually matter for protein engineering, i.e. low data, extrapolation to more mutations, out-of-distribution sequences?
Remote homology and protein design: two sides of the same coin. Instead of finding remote homologs, we used TEA to design completely de novo proteins, folding into desired TEA sequences.
I always love working with Jay, and “speed-running” this proof of concept was no exception.
Also a great time to showcase @lorenzopantolini.bsky.social's awesomeness as he slowly starts the job hunt! If you need someone with a deep understanding of biological latent spaces and how to exploit them for practical applications, he's your guy.
This was a speed-run to validate the in silico proof-of-concept, but the possibilities are endless. It may represent a path orthogonal to current structure-based methods. We're working on adaptations and, of course, looking to experimentally validate. (8/n)
Previous MCMC works used contact map loss (needed ~170k steps, Verkuil et al. 2022) or ESMFold pTM (i.e folding at every step, Hie et al. 2022). By optimising a 1D sequence with TEA, we see a >10x speed increase. (7/n)
For unconditional design, we get high-pLDDT proteins unlike any known sequences. A small TEA k-mer diversity loss helped steer us away from simple coiled-coils toward complex secondary structure combos. (6/n)
For template-guided design, we generated novel sequences predicted to fold into both de novo and natural scaffolds (AF2 single seq). Many have a NEFF of 1. No structures were used in the making of these designs. (5/n)
The approach:
1. Take a random sequence
2. Randomly mutate
3. Accept/reject via Metropolis criterion based on ESM2 likelihood + TEA template match (or TEA entropy if unconditional).
This is fast, 30k steps in ~25min. (4/n)
We noticed that TEA logit entropy correlates well with structure prediction confidence (pLDDT). Ideally, we could combine the ESM2 likelihood (naturalness) with TEA (structural consistency) to guide design. (3/n)
We recently released The Embedded Alphabet (TEA), a tiny head on top of ESM2 converting amino acids into a new 20-letter structural alphabet. Great for search (see bsky.app/profile/lore...), but we wondered: could we use it for generation? (2/n)
A fun little idea that worked surprisingly well, using a structure-informed yet structure-independent alphabet for de novo protein design: www.biorxiv.org/content/10.6...
🧵(1/n)
My time in @martinsteinegger.bsky.social's group is ending, but I’m staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org
I'm really excited to break up the holiday relaxation time with a new preprint that benchmarks AlphaFold3 (AF3)/“co-folding” methods with 2 new stringent performance tests.
Thread below - but first some links:
A longer take:
fraserlab.com/2025/12/29/k...
Preprint:
www.biorxiv.org/content/10.6...
Thanks a lot for the review! We somehow missed it until quite recently but I think addressed a good chunk of the comments in revision anyway, looking forward to your thoughts when it's out
🚀 New paper in @natmethods.nature.com!
We present OpenStructure's powerful scoring capabilities, used to assess predictionsin CAMEO and CASP.
Read the full study here:
🔗 doi.org/10.1038/s415...
#StructuralBiology #Bioinformatics #OpenStructure #CASP #CAMEO #ProteinStructure
Been excited about this one for a while! What would you do with a new alphabet and the wealth of protein sequence bioinformatics at your disposal? We're also around at #EMBOComp3D Heidelberg and MLSB Copenhagen this week to discuss
OpenFold3-preview (OF3p) is out: a sneak peek of our AF3-based structure prediction model. Our aim for OF3 is full AF3-parity for every modality. We now believe we have a clear path towards this goal and are releasing OF3p to enable building in the OF3 ecosystem. More👇
This October I’m drawing one molecule a day inspired by proteins in pdb @rcsbpdb.bsky.social
Day 2/31
Prompt WEAVE
N-terminal domain of a Fibrion - a building block of silk fiber produced by silkworms.
Pdb: 3UA0
Next prompt is CROWN and I would love your suggestions!
Viral AlphaFold Database (VAD) is live in Science Advances
~27,000 predicted viral protein monomers & homodimers
Conserved folds across bacteria, archaea & eukaryotic viruses
New toxin–antitoxin system KreTA uncovered
Vast “functional darkness” remains uncharted
www.science.org/doi/10.1126/...
Océane Follonier @oceanef.bsky.social for
“From bytes to binders: design, score and optimize” #bc2basel #posterprize
Critical benchmarking of structure prediction methods has been crucial for measuring progress and detecting breakthroughs. But how will the future look like? Join the discussion at our workshop in Basel on September 8 - just before the [BC]2 conference.
@sib.swiss @biozentrum.unibas.ch
⬇️⬇️⬇️