This reframes the folding problem as: what determines the burial of the hard-to-predict core residues? The core identity score is available on GitHub with a Google Colab notebook. Try it on your own structures! 8/8 Link: github.com/agrigas115/core_identity_score
Posts by Alex Grigas
Can hydrophobicity scales identify the correct core? The textbook picture says hydrophobic collapse drives folding. But ~23% of incorrectly folded models have cores that are more hydrophobic than the native fold. Current scales can't solve core identity by maximization. 7/8
Core identity scoring is robust to ~10% random label noise. But sequence-based predictors don't make random errors, they fail on hydrophobic residues with high label entropy, which are precisely the residues that matter most for fold quality. 6/8
What about predicting from sequence alone? We trained a lightweight predictor on ESM2 embeddings for burial and compared to ESM2-predicted contacts. Predicting burial from sequence gives a better LDDT correlation than using contacts (ρ=0.82 vs 0.75), and combining the two doesn't help. 5/8
To fairly compare, we measure bits/residue by accounting for label entropy and send random subsets of true labels. Core identity reaches ρ=0.9 at just 0.4 bits/residue, versus 0.68 for contacts and 0.58 for 3Di. It's the most efficient encoding we tested. 4/8
Surprisingly, matching just the N binary burial labels to the experimental structure predicts LDDT nearly as well (ρ=0.94) as matching the full N(N-1)/2 contact map (ρ=0.95). A single label per residue rivals a pairwise representation. 3/8
To test this, we encode ~24,000 CASP structural models using different representations - contact maps (N(N-1)/2 pairwise binary labels) and core identity (N binary labels) for example - and ask: how well does each predict the accuracy of the backbone (LDDT)? 2/8
How much information does it take to fold a protein? Not much, if you use the right information! We find that residue burial, a binary label of core vs surface, encodes a protein's fold highly efficiently and even improves ESM2's structure representation. 1/8 www.biorxiv.org/content/10.6...
Excited to highlight a new preprint about mechanical contributions to tissue homeostasis, from the Manning group in collaboration with the amazing Carien Niessen and Sara Wickstrom @sarawickstrom.bsky.social labs, spearheaded by Dr. Somiealo Azote: www.biorxiv.org/content/10.6...
9/ We end by doing some mean-field theory to explain the numerical results. We’re very excited to go hunting for the molecular mechanisms in vivo guided by our modeling and to add new details to the models guided by new experiments.
8/ We also developed a second model of CIL where instead of crawling away from neighbors, cells reach out and grab new neighbors away from their current neighbors. This pulling model also results in a fluid under tension.
7/ Next, we figured the simplest way to fight the clumping instability is to direct that motion not randomly, but away from a cell’s neighbors, like contact inhibition of locomotion. This generates a tensioned fluid network of cells that, while directed, still flows diffusively.
6/ We first tried maintaining the cell network by letting the cells move with a random self-propelled walk commonly used to model cell motion. However, no set of parameters could generate the kind of material we see in experiments. It always clumps!
5/ But how can a network be under tension and flowing at the same time? Breaking one bond would cause a cell to be pulled towards its remaining neighbors. To model this, we use a simple but effective model of hysteretic sticking between cells.
4/ However, those stellate appendages made us wonder, is there tension across those arms? By ablating the PSM with a laser, we measure a significant retraction velocity, suggesting yes, the cellular network is under tension!
3/ We confirm that, just like in many other systems of body-axis elongation, when we track the relative motion of the cells in the avian PSM, they move diffusively like a fluid.
2/ First, notice how different the cell architecture in the avian presomitic mesoderm (PSM) is from confluent and bubble-like cells seen in other tissues. It looks more like a network of cells attached by stellate arms.
How do sparse mesenchymal cells, with unique stellate arms spanning large gaps between cells, maintain their network while still flowing during development? In our new preprint we describe the avian PSM as a fluid under tension and develop new theory to explain it: www.biorxiv.org/content/10.6...
All the code to run the all-atom protein model is available on my GitHub. All you need is a protein PDB file with the hydrogens added using the Reduce software. Please let me know if you want to use it and have any trouble!
github.com/agrigas115/H...
We first review the jamming transition, show how it applies to polymer collapse and then develop a new all-atom protein model that captures just the complex shapes of amino acids plus their hydrophobic interactions.