How do you kill a MRSA superbug armed with 15 different anti-phage defense systems? You make a smarter phage. Check out our latest preprint on overcoming bacterial immunity using defense-guided engineering to build durable therapeutic phage cocktails! Led by Sarah Voss. doi.org/10.64898/202...
Posts by Burstein lab
🚀 Our results highlight a promising direction for making protein language models more efficient and scalable. Read all about it! www.biorxiv.org/content/10.6... 5/5
⚡ Reduced alphabets yield shorter inputs and major runtime gains, while maintaining comparable, and sometimes improved, predictive performance. (4/5)
🔤 By combining Byte Pair Encoding (BPE) with reduced amino acid alphabets based on residue properties, we train new pLMs and evaluate them across diverse biological tasks, like solubility, enzyme, PPI, and stability prediction. (3/5)
⚖️ Unlike natural languages, proteins aren’t clearly separated into “words”, making tokenization tricky. Short tokens create long sentences, while long tokens lead to a sparse vocabulary that is hard to learn. But reducing the alphabet size might help! (2/5)
🧬 New preprint alert!
Protein language models have transformed biology - but what about the tokens they read?
In our new preprint, 👑@EllaRannon👑 studies how tokenization choices shape pLM performance and efficiency. 🧵 (1/5)
www.biorxiv.org/content/10.6...
4/4 To allow standardized benchmarking for future tool development, we also released B-PPI-DB, a curated bacterial PPI database derived from STRING: doi.org/10.5281/zeno...
We hope B-PPI is just the first of many efficient bacterial PPI predictors!
3/4 B-PPI outperforms other rapid methods for bacterial PPI prediction without the high cost of structural folding and generalizes to unseen interactions with minimal fine-tuning.
2/4 B-PPI is a cross-attention model for bacterial PPI prediction at scale. Given protein pairs, it leverages ProstT5, a structure-aware protein language model, to generate embeddings, and outputs the interaction probability.
1/4 Ever wanted to predict bacterial protein-protein interactions (PPI) on a large scale?
We wanted to, but realized there’s no such algorithm that is both rapid and optimized for bacterial protein analysis.
This led our ⭐️Chen Agassy⭐️ to develop B-PPI: doi.org/10.64898/202...
6/6 🔮 What's next for NLP in biology? We discuss future directions as well. Join us in exploring the future of this exciting field! arxiv.org/abs/2506.02212
5/6 💡 Discover how NLP is being applied to:
• Protein structure prediction 🏗️
• Taxonomic classification 🌳
• Mutational effect prediction 🔀
• Gene expression prediction 📈
And much more!
4/6 🧩 Tokenization challenges? We've got that covered too! Explore different approaches to breaking down biological sequences and their impact on model performance.
3/6 📚 We break down the evolution of NLP models in biology, from classic word2vec to cutting-edge transformers and hyena operators. Understand their strengths, limitations, and exciting applications!
2/6 🔬 We dive deep into how NLP techniques are revolutionizing the analysis of biological 'languages':
• DNA 🧬
• RNA 🧬
• Proteins 💪
• Entire genomes 🔍
Learn how these methods are unlocking new insights in genomics!
1/6 🧬📊 Curious about the buzz around NLP in biology? Feeling overwhelmed by the rapid developments? We've got you covered! Our review on NLP applications in genomics, by the wonderful Ella Rannon, is now out as a pre-print! #NLP #Bioinformatics arxiv.org/abs/2506.02212
Hello Bluesky! Burstein lab just joined - are we too late for the party 👀 ??