Added a JAX translation of the excellent Proteina-Complexa (from nvidia, @kdidi.bsky.social , @karstenkreis.bsky.social ) to mosaic. You can do beam search with any mosaic loss (e.g. protenix + mpnn) and JAX with generate efficient GPU/TPU code.
Posts by Karsten Kreis
πΈ Find all details, links to papers, model weights and code, and link to the Teddymer dataset on our project page.
π₯ Project page: research.nvidia.com/labs/genair/...
Finally, please check Kieran's (@kdidi.bsky.social) great thread on Proteina-Complexa, too: bsky.app/profile/kdid...
πΈ Grateful to our amazing academic and industry partners for this exciting collaboration!
@manifoldbio.bsky.social, @novonordisk.bsky.social, Viva Biotech, Duke University (Soderling Lab), Cambridge University (Hollfelder Lab), @lmu.de (Khmelinskaia Group), @SeoulNatlUni (Steinegger Lab)
(19/n)
πΈThis was an effort by a brilliant team at NVIDIA & partners
NVIDIA shoutouts: bsky.app/profile/kdid..., Danny Reidenbach, Zuobai Zhang, Guoqing Zhou, Zhonglin Cao, Tomas Geffner, Micha Livne, @machine.learning.bio, Emine Kucukbenli, @arashv.bsky.social
A privilege working with this team.
(18/n)
πΈ A key highlight: We achieved de novo design of carbohydrate binders. We targeted the Blood Group B antigen, reaching a 21% hit rate.
To the best of our knowledge, no prior computational methods have achieved de novo design against these challenging polar targets.
(17/n)
πΈ Targeting Viruses: As part of the recent Adaptyv binder competition, we used Proteina-Complexa to design a nanomolar binder (56 nM) against the Nipah virus G protein, successfully targeting its recessed receptor-binding site (see visualization of the experimental hit).
(16/n)
πΈ Kinase targets and varying binder size:
Spanning two distinct size regimes, we designed peptide (<31 amino acids) and miniprotein binders (49β74 amino acids) for kinase targets like CK1Ξ΄ and PAK1. Proteina-Complexa achieved 40-50% hit rates for these difficult motifs.
(15/n)
πΈ Activin Binders:
Next, we designed de novo binders for the Activin receptor type IIA (ActRIIA) that block myostatin signaling in cells. Our tightest binder showed a KD of 36 nM and functional inhibition in downstream experiments.
(14/n)
πΈ Aside from the massive screen, we conducted case studies of different targets in separate experiments.
For instance, we generated de novo binders for PDGFR: We achieved a very high 63.5% hit rate, with top candidates reaching double-digit picomolar affinity (93.6 pM).
(13/n)
πΈ As part of this large screen, we also conducted a large-scale systematic wet lab comparison to contemporary binder design methods. Proteina-Complexa outperforms the baselines; see chart below.
In particular its self-generated sequences work well - no more re-design.
(12/n)
πΈ First, we performed a high-throughput massive-scale screen across 127 diverse and challenging targets. We screened around 1 million candidates in total, measuring all-to-all binding.
86 targets yielded experimentally validated hits, with 74 of those being specific.
(11/n)
πΈ Now experimental results. We conducted a massive campaign in collaboration with @ManifoldBio, @viva_biotech, @novonordisk, @Cambridge_Uni, @DukeU, @LMU_Muenchen.
𧬠"Latent Generative Search unlocks de novo Design of Untapped Biomolecular Interactions at Scale."
(10/n)
πΈ Quantitatively, our inference-time scaling strategies (we use, for instance, MCTS, Feynman-Kac Steering and Beam Search) outperform previous hallucination methods under normalized compute budgets, setting a new state-of-the-art in in-silico binder design.
(9/n)
πΈ Beyond regular binding, our model excels at atomistic motif scaffolding for enzyme design. On the AME benchmark, Proteina-Complexa significantly outperforms RFDiffusion2, faithfully reconstructing complex active site geometries.
(8/n)
πΈ We can explicitly optimize for biophysical properties during generation. In particular, Proteina-Complexa leverages interface hydrogen bond optimization, which steers the generative search toward candidates with denser, more stable interaction networks.
(7/n)
πΈ To overcome the scarcity of multimer data, we introduce Teddymer: a synthetic dataset of 0.5M clustered binder-target pairs, constructed by splitting AFDB monomers into structural domains, simulating realistic protein-protein interactions.
Link to data on project page.
(6/n)
πΈ "Latent Generative Search": We unify generative modeling with test-time optimization for high performance protein design. Scaling compute at inference via strategies like beam search and MCTS, we can steer the model toward higher-quality, physically realistic binders.
(5/n)
πΈ Technical Core: We use La-Proteina's partially latent flow matching framework that co-designs protein sequence and atomistic structure jointly.
No discrete amino acid tokens.
No separate inverse folding (no sequence re-design).
Fully end-to-end atomistic generation.
(4/n)
πΈ Proteina-Complexa is a new protein binder design framework leveraging generative pretraining and test-time compute scaling.
It can generate in silico binder candidates for diverse targets, including single and multi-chain proteins and small molecule ligand targets.
(3/n)
πΈ We present two papers covering Complexa's core method development and a large-scale experimental validation effort.
Letβs dive into the ICLR 2026 Oral paper on the method first: "Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute"
(2/n)
π’π’ Proteina-Complexa π’π’
Atomistic Binder Design with Generative Pretraining and Test-Time Compute + Experimental Validation at Scale
βοΈ Project page (research.nvidia.com/labs/genair/...) for:
π Method paper (ICLR' 2026 Oral)
𧬠Wet lab paper
π οΈ Code & Models
π Data
π§΅ Thread
(1/n)
Partially-latent flow matching enables sequence-structure codesign of large proteins and functional motif scaffolding.
@kdidi.bsky.social @machine.learning.bio @karstenkreis.bsky.social @arashv.bsky.social
arxiv.org/html/2507.09...
3β£ Efficient Molecular Conformer Generation with SO(3) Averaged Flow-Matching and Reflow
openreview.net/forum?id=1B1...
βοΈ I'm also on a panel on synthetic data (synthetic-data-iclr.github.io)!
I'm excited to discuss research and to meet new and old friends and collaborators! π
(5/n)
@gembioworkshop.bsky.social
papers:
1β£ EquiJump: Protein Dynamics Simulation via SO(3)-Equivariant Stochastic Interpolants
arxiv.org/abs/2410.09667 (oral)
(screenshot below)
2β£ Hierarchical Protein Backbone Generation with Latent and Structure Diffusion
arxiv.org/abs/2504.09374
(4/n)
3β£ Energy-Based Diffusion Language Models for Text Generation
arxiv.org/abs/2410.21357
Posters 2
4β£ Truncated Consistency Models
arxiv.org/abs/2410.14895
Posters 4
(screenshot below)
(3/n)
Main track:
1β£ Proteina: Scaling Flow-based Protein Structure Generative Models
research.nvidia.com/labs/genair/...
Orals 3B, posters 4
(video below)
2β£ ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids
arxiv.org/abs/2503.05025
Orals 2C, posters 3
(2/n)
π₯ I'm at ICLR'25 in Singapore this week - happy to chat!
π With wonderful co-authors, I'm co-presenting 4 main conference papers and 3
@gembioworkshop.bsky.social papers (gembio.ai), and I contribute to a panel (synthetic-data-iclr.github.io).
π§΅ Overview in thread.
(1/n)
π₯ ProtComposer (ICLR'25 Oral) is a Swiss Army knife:
(i) Manually create new protein structure layouts? β
(ii) Generation with favorable designability/diversity/novelty trade-offs? β
(iii) Spatially edit given proteins? β
Very original work by the amazing @hannes-stark.bsky.social and Bowen Jing!π₯
πΈCheck out our project page (research.nvidia.com/labs/genair/...), our paper (arxiv.org/abs/2503.00710), and our code (github.com/NVIDIA-Digit...).
π₯ We released 8 sets of weights, for all experiments, for you to play with! π₯
Enjoy! And see you at ICLR'25! π
(11/11)
πΈProteina is a fantastic collaboration with wonderful colleagues at NVIDIA:
π₯ Tomas Geffner*, @kdidi.bsky.social*, Zuobai Zhang*, Danny Reidenbach, Zhonglin Cao, @jyim.bsky.social , Mario Geiger, @machine.learning.bio, Emine Kucukbenli, @arashv.bsky.social, @karstenkreis.bsky.social* π₯
(10/n)