Computational protein design
- "This Primer provides an introduction to the main approaches in computational protein design, covering both physics-based and machine-learning-based tools. It aims to be accessible to biological, physical and computer scientists alike."
www.nature.com/articles/s43...
Posts by Leo Zang
We describe existing platforms for protein/peptide-based ligand identification and the drug delivery systems that might be exploited for the delivery of biologic-based degraders."
Link: pubs.acs.org/doi/10.1021/...
Protein-Based Degraders: From Chemical Biology Tools to Neo-Therapeutics
- "we provide a comprehensive and critical review of studies that have used proteins and peptides to mediate the degradation and hence the functional control of otherwise challenging disease-relevant protein targets.
-- aim to approximate soft optimal denoising processes (a.k.a. policies in RL) that combine pre-trained denoising processes with value functions serving as look-ahead functions that predict from intermediate states to terminal rewards. "
- "We review these methods from a unified perspective, demonstrating that current techniques -- such as Sequential Monte Carlo (SMC)-based guidance, value-based sampling, and classifier guidance
Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review
arxiv.org/abs/2501.09685
- Construct full-length proteins with binding motifs and refining structures using the Rosetta FastDesign protocol and grafting (with a potential round of LigandMPNN optimization)
- Engineer and validate binders for Bcl2–venetoclax, DB3–progesterone, and PDF1–actinonin through experimental testing
- Benchmark MaSIF-neosurf against RFAA on 14 ligand-induced PPI complexes with 8,907 decoys from PDBBind
- Use MaSIF-search to predict buried surfaces and identify complementary surface fingerprints from a database of protein fragments (~640,000)
Targeting protein–ligand neosurfaces with a generalizable deep learning tool | @Nature
- MaSIF-neosurf can design binders for protein-ligand complexes, targeting neosurfaces (i.e., ligand-induced structural changes on the protein surface)
Link: www.nature.com/articles/s41...
- Train sequence based models to predict the activity of regulatory elements (MPRALegNet, MPRAnn, EnformerMPRA, and SeiMPRA)
- Use MPRALegNet predicts TFBS combinations, fine-mapping and variant effects
Massively parallel characterization of transcriptional regulatory elements
- Develope an optimized lentiMPRA (lentiviral massively parallel reporter assay) method to test regulatory activity of >680,000 sequences across three cell types (HepG2, K562, WTC11)
Link: www.nature.com/articles/s41...
Integrating genetic algorithms and language models for enhanced enzyme design
academic.oup.com/bib/article/...
DNALONGBENCH: A Benchmark Suite for Long-Range DNA Prediction Tasks
www.biorxiv.org/content/10.1...
Engineering of CRISPR-Cas PAM recognition using deep learning of vast evolutionary data
www.biorxiv.org/content/10.1...
- "This review systematically summarizes recent advances in chromatin interaction matrix prediction models...This article details various models, focusing on how one-dimensional (1D) information transforms into the 3D structure chromatin interactions"
A review of deep learning models for the prediction of chromatin interactions with DNA and epigenomic profiles | @BriefingBioinfo
Link: academic.oup.com/bib/article/...
EnzymeCAGE: A Geometric Foundation Model for Enzyme Retrieval with Evolutionary Insights
www.biorxiv.org/content/10.1...
Semantic mining of functional de novo genes from a genomic language model
www.biorxiv.org/content/10.1...
Bridging Sequence-Structure Alignment in RNA Foundation Models
arxiv.org/abs/2407.11242
Mapping targetable sites on the human surfaceome for the design of novel binders
www.biorxiv.org/content/10.1...
NeuralPLexer3: Physio-Realistic Biomolecular Complex Structure Prediction with Flow Models
arxiv.org/abs/2412.10743
FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction
arxiv.org/abs/2412.10966
Leveraging ancestral sequence reconstruction for protein representation learning
www.nature.com/articles/s42...
Guiding Generative Protein Language Models with Reinforcement Learning
arxiv.org/abs/2412.12979
Harnessing the biology of regulatory T cells to treat disease
- "This Review will discuss recent advances in our understanding of human Treg cell biology, with a focus on mechanisms of action and strategies to assess outcomes of Treg cell-targeted therapies."
www.nature.com/articles/s41...
IgDesign: In vitro validated antibody design against multiple therapeutic antigens using inverse folding
www.biorxiv.org/content/10.1...
Annotation-guided Protein Design with Multi-Level Domain Alignment
arxiv.org/abs/2404.16866
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models
arxiv.org/abs/2406.10391
mRNA m6A detection | @MethodsPrimers
- "This Primer outlines the available tools for detecting and mapping m6A, discusses the strengths and limitations of each method and offers guidance on selecting the most suitable approach."
www.nature.com/articles/s43...
- Use gradient-based approximation to modify protein sequences to increase/decrease specific concept values (e.g., which amino acids for increasing aromaticity).
- Train model with MLM Loss, Concept Loss (mean square error on concept embedding), and Orthogonality Loss (cosine similarity between known/unknown embeddings).
- Add Concept Bottleneck Module (using <cls> token) and Orthogonality Network to standard BERT-like architecture.
Concept Bottleneck Language Models For protein design
- Introduce CB-pLM (Concept Bottleneck Protein Language Models) from 24M to 3B, trained on UniRef50 and SwissProt over 718 concepts (including Cluster name, Biological process, and Biopython-derived features, etc.)
arxiv.org/abs/2411.06090
Benchmarking recent computational tools for DNA-binding protein identification
- "we conduct an unbiased benchmarking of 11 state-of-the-art computational tools as well as traditional tools such as ScanProsite, BLAST, and HMMER for identifying DBPs."
Link: academic.oup.com/bib/article/...
Title correction:
A general temperature-guided language model to design proteins of enhanced stability and activity
- Mouse level: Human-homologous protein data sourced from OGEE database
- Cell line level: Protein essentiality data from Project Score database, providing insights across 323 different human cell lines