Could we accelerate the discovery of the next GLP-1R agonist? 🚀 Here, we introduce PepTune, a multi-objective guided discrete diffusion model that generates target-specific peptides, while optimizing their therapeutic properties! 🪐
📜: arxiv.org/abs/2412.17780
💻: huggingface.co/ChatterjeeLa...
Posts by Pranam Chatterjee
So excited to host the 2nd GEM Workshop at ICLR 2025! 🎉 We have amazing speakers/panelists 🧑🔬, money for new AI+Experiment collabs 🤑, and we're partnering with @naturebiotech.bsky.social to get the best papers into review! 📜 Definitely submit your new work and see you in Singapore!! 🇸🇬
So excited to have Christian (@machine.learning.bio) join us at Duke!! 💙 We're building such an amazing AIxBio community with @rohitsingh8080.bsky.social, @alextong.bsky.social, Phil Romero, and others. ESPECIALLY in all things bio-based language models! 💻 🧬 Come join us in Durham! 😈
🚨 Current graduate students! If you're interested in developing and leveraging generative language models for therapeutics design, please apply to the
FutureHouse's postdoctoral fellowship and indicate my lab as an option! 😃 $125k salary and access to all of their amazing resources! 🌟
Surreal! 🤩 With co-founders Martin and Dina, we started Gameto in 2020 with just a silly graph theory algorithm I developed to predict TFs that could differentiate ovarian cells. 💻➡️🧫 Now, little Mia is here with the tech that has grown out of that work. 🐣 So proud!! 🥰
www.forbes.com/sites/alexyo...
Any AIxBio folks at NeurIPS and want to meet up with me and the lab? So many of our best collaborations have come from meetings at NeurIPS, ICML, and ICLR!! 🌟
We are so grateful to #EndAxD for funding our research leveraging generative language models to design peptide-guided degraders of dysregulated GFAP! 🙏 Please share and consider giving to this wonderful, grassroots organization. 💫 endaxd.org
#EndAxD Instagram Post: www.instagram.com/p/DC7sV2GPst...
Yes, definitely. A learned tokenizer is always more complex. The nice thing about ESM-2 is that it's a per-residue tokenization, and doesn't use BPE, SentencePiece, or some other irrelevant tokenizer. It allows us to get good residue-level embeddings. :)
I worry that during pre-training, the token embeddings ended up having quite expressive representations themselves. Using a special token would work, but you would need to really contextualize their token representations, just as the <mask> had. Otherwise, I could imagine a dropoff in performance.
Try out Fred's (my PhD student) reimplementation ESM2 with FlashAttention, achieving up to 60% memory savings and 70% faster inference! 🚀 No need to change your ESM code — it’s API-compatible! github.com/pengzhangzhi...
Yes we run most of the inference pipelines on A100s and H100s. Haven’t had a problem — A6000s have been fine as well.
Ooh such a good idea!! I’ll try it! :)
Great points! I actually never liked it either and most of the time, it’s hard to effectively debug with everyone watching. 😅
Alright new BlueSky friends, need some advice! 💡 I’m teaching my Generative Models (pLMs, graph models, diffusion, etc.) class at Duke next semester, and want to mix it up! Question: should I do theory on the board ✏️+ live coding 🧑🏾💻, or pre-prepared slides 🖥️ with annotated code snippets?
Of course!! Will do! The biggest test will be when we down select generated molecules based on Boltz-1 metrics and we’ll see if they work in the wet lab. 🧫
New RFDiffusion-for-peptide (RFpeptide) paper from @gauravbhardwaj.bsky.social and team at @uwproteindesign.bsky.social! 🌟 Beautiful binding data on 4 highly-structured targets (pLDDT > 90)! 🙌🏾 Not too confident this would work on highly disordered targets, though. 🤔
www.biorxiv.org/content/10.1...
Yeah same. The ByteDance one, Proteinix is quite good and the engineering from them is always clean!
Yeah nothing easy about it! And the throughput is low that it’s hard to get a good look at hit rate of the algorithms without doing a mini display assay. Ahh such is life! 😅
We usually do some hacky ELISAs via biotinylation of the analyte and then SPR the best ones. A horridly cumbersome set of experiments. 😣
Ugh so true!! And as a lab that does peptides, why is it so slow and expensive to synthesize an 18mer is insanity. 🤦🏾♂️ Only alternative is to His-tag purify, which also sucks. And don’t get me started with Kd analysis…still no reliable high-throughput binding affinity measurement. 😣
Agreed!! We’re using the AF3 models to validate our language model-based binder designs to structured targets (and metals, DNA, etc) prior to experimental testing, as a sort of a hint on performance. But of course, the true test is in the lab for us!! 🧫
I’m curious to see how all of the new AF3 mimics perform. 🧐 My lab’s been installing them on our servers, and faster inference and ease-of-use are key for us. Boltz-1 has an early lead, but nothing beats a good frozen pLM with a structure trunk! 😅 Bc accuracy to the PDB isn’t the best metric. 🤷🏾♂️
Hi new followers! 🥰You may know me from Twitter as the sequence-first, pLM guy — hope you will continue to follow my lab’s work! 🥹 While you’re here, check out my lab’s new preprint on delivering pLM-generated degraders via LNPs to degrade cytosolic β-catenin in vivo! www.biorxiv.org/content/10.1...
A strategy that seems to be useful is using heterodimeric PDBs of single proteins and cutting interfaces — there’s a bit more conformational flexibility captured, and our LMs have done better with this noisier data.
We’ve worked to create a similar dataset with minimal leakage, but to do interface prediction from pLM residue embeddings. It’s super tough and we’ve yet to find a good train/test cluster-based split that would achieve this.
Which paper is this from? I'm not certain the latent spaces are compatible here to create useful protein representations.
SaLT&PepPr is published in
Communications Biology! Here, we fine-tune the ESM-2 pLM to identify peptidic binding sites on target-interacting partner sequences. We fuse these "guide" peptides to E3 ubiquitin ligases to degrade disease-causing proteins! Take a read! :) www.nature.com/articles/s42...
Happy to share our early work on generating binding peptides conditioned ONLY on the target sequence! 🌟 PepMLM masks cognate peptides at the end of target protein sequences, and tasks ESM-2 to fully reconstruct the binder region. 😷 arxiv.org/abs/2310.03842
“PepMLM: Target Sequence-Conditioned Generation of Peptide Binders via Masked Language Modeling” 🧶🧬
Fine-tunes ESM-2 network to achieve “target-conditioned de novo binder design from sequence alone”
arxiv.org/abs/2310.03842
huggingface.co/TianlaiChen/...