supervised models not to be trained on the benchmark datset (or similar data). Different subcategories could also be interesting, for example only VAMP-seq datasets.
Posts by Alexander Gress
learn on previous Fowler MAVEs and the same was true for the Roth MAVEs, despite the Roth used a VAMP-seq for the first time to do their CAGI7 MAVE, so the bias did not come from the experimental technique. So, I was hoping to have an open benchmark set, where applicants need to prove their ...
Yes, CAGI is great, and I participated in the most recent iteration, but it takes a lot of time and work from many people to have one benchmark every couple (or more) years. Interestingly, it is also not free of bias. It was observed, when predicting the Fowler MAVE, it was an advantage to just ...
Neat paper! I was sad, when Thea's talk in the VESS got canceled (or rescheduled?), since I am also looking in that topic. What I really miss currently is a benchmark like ProteinGym for supervised models that try to generalize to unseen MAVEs. Do You think it would be a doable endeavour?
Information Leakage in Enzyme Substrate Prediction www.biorxiv.org/content/10.64898/2026.02...
We are happy to present a piece of analysis that I consider to have a major impact on our understanding of how good variant effect prediction (VEP) tools really are: www.biorxiv.org/content/10.6....
๐งฌ๐ฅ๏ธ Call for Abstracts OPEN!
Join us in celebrating 40 years of the German Conference on #Bioinformatics (#GCB2026) ๐
๐ Saarbrรผcken | ๐ 22โ25 Sep 2026
Submit your research & shape the future of Bioinformatics!
Deadlines
Workshops: 31 Mar 26
Talks: 3 May 26
Poster: 6 Aug 26
๐ GCB2026.DE
StructGuy: Data leakage free prediction of functional effects of genetic variants. www.biorxiv.org/content/10.64898/2025.12...
We are excited that our paper "Cleanifier: Contamination removal from microbial sequences using spaced seeds of a human pangenome index" is now published at Bioinformatics (doi.org/10.1093/bioi...).
You can find it at gitlab (gitlab.com/rahmannlab/c...) or install it via PyPI or Bioconda.
Detection of alternative splicing: deep sequencing or deep learning? www.biorxiv.org/content/10.1101/2025.08....
SingleRust: A High-Performance Toolkit for Single-Cell Data Analysis at Scale www.biorxiv.org/content/10.1101/2025.08....
Our preprint is finally out for SingleRust:
doi.org/10.1101/2025...
Stay tuned: @singlerust.bsky.social, @ianfd.bsky.social
github.com/SingleRust
What an exciting week it's been at the #ISMBECCB2025 conference! The Drug Bioinformatics group had a fantastic presence, with six members presenting their latest work across various COSIs.
www.linkedin.com/feed/update/...
This week I presented DataSAIL at the #ISMBECCB2025 conference in #Liverpool
It has been an amazing chance and experience to meet many people working on information leakage. And getting great ideas to extend it
Nicely supervised by & colaborated with @dbblumenthal.bsky.social @ok55991.bsky.social
I am very excited to share the publication of our tool StructMAn 2.0 in the NAR webserver issue (10.1093/nar/gkaf381). If you want to annotate protein structures to protein sequences or if you are looking for structural evidence for PPIs, consider using: tools.helmholtz-hips.de/structman/
DataSAIL is out in @naturecomms.bsky.social
Since the preprint, we have improved the work a lot, thanks to countless reviewers and feedback.
You can find it here: nature.com/articles/s41...
Thanks, @dbblumenthal.bsky.social and @ok55991.bsky.social, for helping and supervising me on this journey.