Alex Grigas (@agrigas) Bsky

GitHub - agrigas115/core_identity_score: Predict a protein structure's backbone accuracy (average LDDT) by comparing its core residues to the core predicted using ESM2 embeddings Predict a protein structure's backbone accuracy (average LDDT) by comparing its core residues to the core predicted using ESM2 embeddings - agrigas115/core_identity_score

This reframes the folding problem as: what determines the burial of the hard-to-predict core residues? The core identity score is available on GitHub with a Google Colab notebook. Try it on your own structures! 8/8 Link: github.com/agrigas115/core_identity_score

2 weeks ago 0 0 0 0

Can hydrophobicity scales identify the correct core? The textbook picture says hydrophobic collapse drives folding. But ~23% of incorrectly folded models have cores that are more hydrophobic than the native fold. Current scales can't solve core identity by maximization. 7/8

2 weeks ago 0 0 1 0

Core identity scoring is robust to ~10% random label noise. But sequence-based predictors don't make random errors, they fail on hydrophobic residues with high label entropy, which are precisely the residues that matter most for fold quality. 6/8

2 weeks ago 1 0 1 0

What about predicting from sequence alone? We trained a lightweight predictor on ESM2 embeddings for burial and compared to ESM2-predicted contacts. Predicting burial from sequence gives a better LDDT correlation than using contacts (ρ=0.82 vs 0.75), and combining the two doesn't help. 5/8

2 weeks ago 0 0 1 0

To fairly compare, we measure bits/residue by accounting for label entropy and send random subsets of true labels. Core identity reaches ρ=0.9 at just 0.4 bits/residue, versus 0.68 for contacts and 0.58 for 3Di. It's the most efficient encoding we tested. 4/8

2 weeks ago 1 0 1 0

Surprisingly, matching just the N binary burial labels to the experimental structure predicts LDDT nearly as well (ρ=0.94) as matching the full N(N-1)/2 contact map (ρ=0.95). A single label per residue rivals a pairwise representation. 3/8

2 weeks ago 0 0 1 0

To test this, we encode ~24,000 CASP structural models using different representations - contact maps (N(N-1)/2 pairwise binary labels) and core identity (N binary labels) for example - and ask: how well does each predict the accuracy of the backbone (LDDT)? 2/8

2 weeks ago 0 0 1 0

Residue burial encodes a protein's fold Protein structure is controlled by a high-dimensional energy landscape, which is a function of all of the atomic coordinates of the protein. Can this landscape be accurately described by a low-dimensional representation? We find that residue core identity, a binary N-dimensional encoding indicating whether each of the N amino acids in a protein is buried in the core or not, can predict the protein's backbone conformation more efficiently than all other representations that we tested. Core identity is 4 times more efficient than previous estimates of the bits per residue needed to encode a protein's native fold, 2 times more efficient than the Cα contact map, and 1.5 times more efficient than the machine-learned embeddings from FoldSeek's 3Di. Even when the folded structure is unavailable, predicting each residue's burial from sequence yields a more accurate estimate of fold quality than predicting pairwise contacts from the same sequence information. Thus, this work emphasizes that the problem of determining a protein's native fold can be re-framed as predicting each residue's core identity. ### Competing Interest Statement The authors have declared no competing interest. Chan Zuckerberg Initiative (United States), 2023-329572 NIH, T32GM145452

How much information does it take to fold a protein? Not much, if you use the right information! We find that residue burial, a binary label of core vs surface, encodes a protein's fold highly efficiently and even improves ESM2's structure representation. 1/8 www.biorxiv.org/content/10.6...

2 weeks ago 2 2 1 0

Excited to highlight a new preprint about mechanical contributions to tissue homeostasis, from the Manning group in collaboration with the amazing Carien Niessen and Sara Wickstrom @sarawickstrom.bsky.social labs, spearheaded by Dr. Somiealo Azote: www.biorxiv.org/content/10.6...

2 months ago 31 9 1 1

9/ We end by doing some mean-field theory to explain the numerical results. We’re very excited to go hunting for the molecular mechanisms in vivo guided by our modeling and to add new details to the models guided by new experiments.

4 months ago 1 0 0 0

8/ We also developed a second model of CIL where instead of crawling away from neighbors, cells reach out and grab new neighbors away from their current neighbors. This pulling model also results in a fluid under tension.

4 months ago 1 0 1 0

7/ Next, we figured the simplest way to fight the clumping instability is to direct that motion not randomly, but away from a cell’s neighbors, like contact inhibition of locomotion. This generates a tensioned fluid network of cells that, while directed, still flows diffusively.

4 months ago 1 0 1 0

6/ We first tried maintaining the cell network by letting the cells move with a random self-propelled walk commonly used to model cell motion. However, no set of parameters could generate the kind of material we see in experiments. It always clumps!

4 months ago 0 0 1 0

5/ But how can a network be under tension and flowing at the same time? Breaking one bond would cause a cell to be pulled towards its remaining neighbors. To model this, we use a simple but effective model of hysteretic sticking between cells.

4 months ago 0 0 1 0

4/ However, those stellate appendages made us wonder, is there tension across those arms? By ablating the PSM with a laser, we measure a significant retraction velocity, suggesting yes, the cellular network is under tension!

4 months ago 0 0 1 0

3/ We confirm that, just like in many other systems of body-axis elongation, when we track the relative motion of the cells in the avian PSM, they move diffusively like a fluid.

4 months ago 0 0 1 0

2/ First, notice how different the cell architecture in the avian presomitic mesoderm (PSM) is from confluent and bubble-like cells seen in other tissues. It looks more like a network of cells attached by stellate arms.

4 months ago 0 0 1 0

Sparse mesenchymal cell networks as a fluid under tension Sparse mesenchymal cellular networks are ubiquitous across animals, shaping both embryonic and adult structures through dynamic interactions with epithelia. Yet, the physical principles underlying the...

How do sparse mesenchymal cells, with unique stellate arms spanning large gaps between cells, maintain their network while still flowing during development? In our new preprint we describe the avian PSM as a fluid under tension and develop new theory to explain it: www.biorxiv.org/content/10.6...

4 months ago 11 4 1 3

GitHub - agrigas115/HS-HP: HS+HP all-atom protein model HS+HP all-atom protein model. Contribute to agrigas115/HS-HP development by creating an account on GitHub.

All the code to run the all-atom protein model is available on my GitHub. All you need is a protein PDB file with the hydrogens added using the Reduce software. Please let me know if you want to use it and have any trouble!
github.com/agrigas115/H...

1 year ago 0 0 0 0

We first review the jamming transition, show how it applies to polymer collapse and then develop a new all-atom protein model that captures just the complex shapes of amino acids plus their hydrophobic interactions.

1 year ago 0 0 1 0

Protein Folding as a Jamming Transition Densely packed protein cores have the same interior packing density irrespective of overall fold, arising from a jamming transition where amino acids reach a critical, incompressible density, as expla...

Our new paper relating protein folding and jamming is out in PRX Life! All protein cores are densely packed irrespective of overall fold, and we show this arises from a jamming transition where amino acids reach a critical, incompressible density.
journals.aps.org/prxlife/abst...

1 year ago 0 0 1 0

Posts by Alex Grigas