SeqHub MSA live, integrated with structure vis! Another user requested feature🤝
Posts by Yunha Hwang
We’re incredibly grateful to have Alex Bateman on our advisory board. Biology today would look very different without UniProt. Scientific data infrastructure is the bedrock of innovation, and we’re excited to learn from Alex’s experience helping building such a foundational resource.
Most protein-protein interaction tools work on protein pairs. FlashPPI runs at proteome scale and now across two proteomes at once.
Upload any two datasets (full genomes, partial genomes, or custom protein sets) and get back a predicted interaction network spanning both.
We're hosting a live walkthrough of FlashPPI in SeqHub on April 15 at 11am EST.
We'll briefly discuss our protein-protein interaction model then walk through how you can use it in SeqHub.
Register here: forms.gle/iBQrpYnLeiF1...
My group at MIT is seeking a research scientist with a strong *experimental* background to lead and help shape the lab’s experimental infrastructure, supporting efforts to advance AI-driven enzyme discovery and characterization.
See the full JD here: acrobat.adobe.com/id/urn:aaid:...
Applications for MIT Novo-Nordisk AI postdoc fellowships are due Apr 15. Focus area lists AI and Biology topics, apply to work on this exciting field with amazing peers! engineering.mit.edu/novo-nordisk
Thanks for the idea, we briefly checked this and for E.coli test set predictions, we get ~80% of the high confidence interactions to be more than 5 genes away from each other, so a large fraction is non-syntenic!
For wirus-microbe -- yes (we have examples in paper)!, for microbe-host, we haven't fully evaluated how this would work for eukaryotic proteomes.
We thought a lot about how to deploy 𝑭𝒍𝒂𝒔𝒉𝑷𝑷𝑰, and we are very proud of this implementation that integrates annotation+context+CoSearch+agent with FlashPPI on SeqHub!
Thanks for pointing this out! We will add an option to download the network!
Step-by-step how to run FlashPPI on your favorite genomes!
Predicting protein-protein interactions (PPIs) at proteome scale can take months with co-folding models due to the massive all-vs-all comparisons required.
We are excited to announce FlashPPI, a contrastive learning framework that predicts proteome wide physical interfaces in minutes. 1/🧵
Preprint: www.biorxiv.org/content/10.6...
For a typical microbial genome, all-vs-all PPI prediction with AF3 would take hundreds of GPU-years. With FlashPPI, we can scale molecular interaction prediction across diverse, non-model microbial genomes, unlocking truly scalable discovery. We deployed FlashPPI on Seqhub.org, give it a spin!
3. Online hard negative mining improves sensitivity.
We use joint optimization to let the model propose hard negatives for contact prediction during training. This results in even more sensitive and robust performance.
2. Learning how proteins interact matters
It's not enough to learn that 2 proteins interact, learning *how* they interact at residue level is critical for performance.
Some fun highlights on what we learned along the way:
1. Reframing PPI prediction as retrieval
Instead of asking “Do A and B interact?”, we ask: Which proteins does A interact with in this genome? This shift in framing enables linear-time scaling and ultrafast performance.
For technical details, check out @ancornman1’s excellent breakdown of the model. bsky.app/profile/anco...
Protein–protein interactions (PPIs) are key to discovering and interpreting new biological functions.
We’re excited to introduce 𝑭𝒍𝒂𝒔𝒉𝑷𝑷𝑰: a new application of gLM2 that uses genomic language modeling to predict proteome-wide PPIs in microbial genomes in minutes.
We’d love to join your lab meeting!
We’ve been meeting with research groups to share how scientists are using SeqHub for sequence and genome analysis, and the conversations have been highly interactive and grounded in real workflows.
Booking info below.
We’re excited to welcome Daniela Bourges-Waldegg to the SeqHub Advisory Board!
Daniela is EVP + Chief Digital & Technology Officer at @addgene.bsky.social. She will help shape our approach to building researcher-centered digital infrastructure with an eye toward long-term scientific impact.
First, @tattabio.bsky.social is now on Bluesky!💙 and second, we launched mult-sequence CoSearch on SeqHub!
This. Is. So. Cool. 🤯
Hi Roland, our servers are in the US, we explicitly state in our docs that we do not train models on private data, and the data is private to you only - unless intentionally made public (for publication/data sharing purposes)!
thanks for the feedback! We are working on making more of the platform exportable as figures😊
Thank you for the shoutout!
Released today from Tatta Bio: SeqHub! A place to explore, annotate, and share sequence data with functional insights.
Over 1,000 scientists worldwide have already used SeqHub to annotate more than 550,000 proteins, uncovering new insights and accelerating discovery.
Annotations are mapped using embedding-based search, making it faster than most alignment-based search. HMM prediction speed-up comes from some optimization and parallelization :)
Thank you! and PaperBLAST team deserves a shoutout for the sequence-paper linkages