Added a JAX translation of the excellent Proteina-Complexa (from nvidia, @kdidi.bsky.social , @karstenkreis.bsky.social ) to mosaic. You can do beam search with any mosaic loss (e.g. protenix + mpnn) and JAX with generate efficient GPU/TPU code.
Posts by Kieran Didi
My lab was lucky to be able to help test Proteina-Complexa binders experimentally. We were blown away by the results. Congrats to our colleagues at NVIDIA!
Efficient protein structure prediction fromcompact computers to datacenters withOpenFold-TRT www.biorxiv.org/content/10.64898/2026.03...
Also checkout the great post from @karstenkreis.bsky.social about our work! bsky.app/profile/kars...
That's it from me! Happy to hear what people think once they try it out (code weights etc all available under permissive license!) 19/19
The whole Proteina effort started more than 2 years ago when Arash Vahdat believed in me and my two partners in crime Tomas Geffner and @karstenkreis.bsky.social to start this crazy protein thing at NVIDIA, unreal to see how far we have come since then as a field. 18/n
Case study 5/5: first fully de novo Carbohydrate binders
We target blood-group B antigen and got a a 21% hit rate.
Massive shoutout to my long-term enzyme collaborator @mattpenner.bsky.social at Cambridge for powering this through, excited about what codesign will unlock for enzymes! 17/n
Case study 4/5: Nipah virus
In the Adaptyv binder competition setting, Proteina-Complexa produced a nanomolar binder (56 nM) to NiV-G, targeting a recessed receptor-binding site. We also show how to go beyond partial diffusion and perform joint sequence-structure redesign 16/n
Case study 3/5: Targeting Kinases (PAK1 + CK1Ξ΄) with peptide binders
Across mini-protein and short-peptide regimes (<31 aa peptides and 49β74 aa mini-binders), we observed strong hit rates on difficult target sites. 15/n
Case study 2/5: ActRIIA, a target for GLP-1 associated muscle wasting
We designed de novo binders that block myostatin signaling in cells. Tightest binder reached KD = 36 nM, with functional downstream inhibition. 14/n
Case study 1/5: PDGFR
We achieved a 63.5% hit rate, with top binders reaching double-digit picomolar affinity (best reported at 93.6 pM). Very cool collab with @novonordisk.bsky.social
13/n
As part of this campaign, we also ran a the biggest head-to-head wetlab comparison of design methods ever. Proteina-Complexa outperformed baselines in this setting. My personal highlight: we are the only method where codesign shines, outperforming MPNN for the first time! 12/n
Massive campaign: ~1M designed candidates screened across 127 diverse/challenging targets with all-to-all binding readouts. We solve more than 2/3 of the targets given a very limited compute budget and often quite challenging hotspots and crops 11/n
Thatβs the method side (come to ICLR in Rio to hear more about at our Oral!). But in silico only goes so far, so now the wet-lab side!
This was a major collaboration with @manifoldbio.bsky.social , @vivabiotech, @novonordisk.bsky.social , @CambridgeUniversity, @DukeUniversity, @lmu.de
. 10/n
Quantitatively, these inference-time scaling strategies outperform prior hallucination-based methods under normalized compute budgets as well as other generative models, setting a new SOTA in in-silico binder design. 9/n
One of my highlights: because search is reward-guided, we can optimize for biophysical objectives during generation - including interface hydrogen-bond terms that promote denser interaction networks. With MLFF improvements being rapid, this becomes more and more powerful! 8/n
Data was also key: with @martinsteinegger.bsky.social and @sooyoung-cha.bsky.social lab we introduce Teddymer, a large synthetic binder-target pretraining resource built from domain-domain interactions derived from AFDB monomers (details + links on the project page). 7/n
In practice, this means scaling inference-time compute using strategies like beam search, MCTS, and FeynmanβKac steering to improve candidate quality and physical plausibility. Force field metrics, interaction energies, folding scores, you choose you reward! 6/n
We then combine the strengths of generative models with optimisation methods like hallucination. We call the inference recipe latent generative search: combine a learned generative prior with search at test time to steer toward better binders. 5/n
Core design choice:
- joint sequence-structure generation in a partially latent atomistic framework (building on La-Proteina).
- No discrete sequence tokenization loop.
- No mandatory post hoc inverse-folding redesign step, the first codesign model that outperforms MPNN! 4/n
Proteina-Complexa is a generative binder design framework that supports diverse targets: single-chain proteins, multi-chain complexes, and small-molecule binding contexts. 3/n
Two papers, one story:
1) ICLR 2026 Oral: method + core modeling advances
2) Experimental validation: large-scale wet-lab evidence across many targets and campaigns
Letβs start with the method paper. 2/n
π’ Weβre launching Proteina-Complexa β and after the Jensen keynote mention, we definitely had to post this thread now ;)
Atomistic binder design with generative pretraining + test-time compute, plus large-scale wet-lab validation.
Project page: research.nvidia.com/labs/genair/...
π§΅ 1/n
Too many REPA / RAE / representation alignment papers lately?
I was lost too, so I wrote a blog post that organizes the space into phases and zooms in on what actually matters for general/molecular ML.
Curious what folks think - link below!
π Blog: kdidi.netlify.app/blog/ml/2025...
GPU-accelerated MMseqs2 offers tremendous speedup for homology retrieval, protein structure prediction with ColabFold, and protein structure search with Foldseek. @martinsteinegger.bsky.social @milot.bsky.social @machine.learning.bio
www.nature.com/articles/s41...
MMseqs2-GPU sets new standards in single query search speed, allows near instant search of big databases, scales to multiple GPUs and is fast beyond VRAM. It enables ColabFold MSA generation in seconds and sub-second Foldseek search against AFDB50. 1/n
π www.nature.com/articles/s41...
πΏ mmseqs.com
For more details read the thread by the man himself: bsky.app/profile/ncor...
An incredible project to witness, led by the most incredible dreamteam @ncorley.bsky.social , @simonmathis.bsky.social and Rohith Krishna with an amazing team inside and outside the Baker lab. Check it out and let us know what you think/contribute to the codebase! 6/6
The preprint shows how atomworks leads to better reference conformers (and better predictions!), enables advanced features in RF3 like chirality-aware training or ligand templating and narrows the performance gap to closed-source models. 5/6
`atomworks.ml` on the other hand offers advanced dataset featurization and sampling for deep learning workflows, all operating on the canonical AtomArray object from @biotite_python so that all transforms are traceable and generalizable between models. 4/6