Posts by Botond Sipos
Amos was not only a giant of bioinformatics and biocuration, but one of the nicest people I've met in academia. His support and advice were invaluable when we were establishing @bgee.org, and I will always remember how warmly he welcomed us to @sib.swiss when I arrived in Switzerland 20 years ago.
We are thrilled to announce the first official release (v0.1.8) of #𝗯𝗲𝗱𝗱𝗲𝗿, the successor to one of our flagship tool, #𝗯𝗲𝗱𝘁𝗼𝗼𝗹𝘀! Based on ideas we conceived of long ago (!), this was achieved thanks to the dedication of Brent Pedersen.
1/n
Displacement-Optimized Tanglegrams for Trees and Networks www.biorxiv.org/content/10.1101/2025.11....
https://link.springer.com/article/10.1007/s00239-025-10277-1 Conceptual overview of hierarchical orthologous groups. An example of one HOG, or gene family. A Species tree with four taxa: plant (green), fish (blue), human (orange), and mouse (yellow), each with one or more genes. B The implied gene tree, dubbed “HOG tree,” and inferred nested HOG composition. Duplication nodes (red) can be deduced based on the species tree topology and clusters of homologous genes at each level. Ancestral genes from which the HOGs descended are shown in gray. C HOGs returned at different taxonomic levels. Consider a gene family that was present in the last eukaryotic common ancestor (LECA). At this level, a single HOG encompasses all genes descending from that ancestral gene. At the Vertebrata level, this gene underwent duplication, leading to two distinct copies, i.e., HOGs. At the Mammalia level, a second duplication further subdivides one of these HOGs, showing how deeper HOGs split into nested subHOGs at more recent levels. The HOG composition implies that a loss event occurred after the mammalian speciation
https://link.springer.com/article/10.1007/s00239-025-10272-6 Summary of the QfO8 meeting. a Hot topics and future directions in method development and applications within the QfO community, namely artificial intelligence, protein domains, protein structure, RNA and splicing isoforms. b Definition of orthology and paralogy, including various paralogous subtypes (e.g. in-paralogs and out-paralogs). c Duplications and functional divergence. d Applications of orthology
https://link.springer.com/article/10.1007/s00239-025-10271-7 Overview of the OrthoXML File Format (simplified). A schematic representation of an OrthoXML file, a standardized XML-based format for representing orthology data. OrthoXML follows a hierarchical structure where elements are enclosed within opening < tag > and closing </tag > tags. < orthoXML > is the root element enclosing other elements. The < species > element contains information about genes. An OrthoXML file can include a < taxonomy > element, which specifies the species tree used to generate the file. Additionally, the < groups > element encapsulates the orthology and paralogy relationships among genes
Our trilogy of orthology publications is online!
Review on Hierarchical Orthologous Groups doi.org/10.1007/s00239-025-10277-1
OrthoXML-Tools doi.org/10.1007/s00239-025-10271-7
A great community effort on Quest for Orthologs in the era of Data Deluge and AI doi.org/10.1007/s00239-025-10272-6
Great work by Nicola De Maio and Nick Goldman - not just scaleable to "pandemic scale" trees but - if I have got this right - arguably more valid than traditional column based bootstrap in the context of very tight evolution.
Yes - isometric scaling as a way to understand the benefits and costs of being small versus large. Haldane's Harpers article from 1926 is an amazing example of popular science writing.
Can an AI tool help us better understand the origins of cancer?
Researchers from EMBL's Korbel Group have developed a new AI method – MAGIC – which, through a game of molecular laser tag, is shedding light on how chromosomal abnormalities form in cells.
www.embl.org/news/science...
#Annotating the genome at single-nucleotide resolution with #DNA foundation models www.nature.com/articles/s41...
Unlocking the regulatory code of #RNA: launching the Human #RNome Project genomebiology.biomedcentral.com/articles/10....
I am genuinely impressed by large language models - they can absorb disparate components of text into some consolidated view, they can produce extremely good language and - with the right model - translate pretty well between languages and they are an excellent text based UI for humans to use. But..
Think of AI labs as Cronos, a titan in Greek mythology, trying to devour his children. The question, as with Cronos, is: can the little ones survive and fight back?
Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.
AGNES: Adaptive Graph Neural Network and Dynamic Programming Hybrid Framework for Real-Time #Nanopore Seed Chaining arxiv.org/abs/2510.16013
Full comic here: www.smbc-comics.com/comic/signal-4 #smbc
Biological life depends on two families of large molecule: nucleic acids and proteins. The first of our collection of primers explains what they are and how they work
I am looking to get my hands on some #Illumina 5-base methylation data - does anyone have a bam file that I could use for some testing? Please RT for reach!
Bayesian probability, like frequentist probability, is a model-based activity that is mathematically anchored by physical randomization at one end and calibration to a reference set at the other
statmodeling.stat.columbia.edu/2025/10/20/b...
Illustration of Burrows-Wheeler Transform and many auxiliary structures from the input string how$now$brown$cow$#
New tool "bwt-svg" for making illustrations of the BWT and the many auxiliary arrays and other structures related to it. Pyodide-based no-installation-necessary interface here: benlangmead.github.io/bwt-svg/. (H/t to @robert.bio for pointing me to pyodide!) Full repo: github.com/benlangmead/....
A paper from @lachlanjmc.bsky.social Lachlan Coin, not active here for the past month, on Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing academic.oup.com/gigascience/...
🚨 New preprint alert 🚨
We systematically benchmarked @nanoporetech.com 's modification-aware basecalling models released for RNA on sets of in vitro and in vivo sequences and made some curious observations 🧬🔍.
bit.ly/4lXqNul
Follow along for a little recap (1/12)
Claus Wilke on Alphafold and the problem of protein folding in 2025
Go 1.25 interactive tour
Go 1.25 is scheduled for release in August, so it's a good time to explore what's new.
#golang
antonz.org/go-1-25/
Excited to launch our AlphaGenome API goo.gle/3ZPUeFX along with the preprint goo.gle/45AkUyc describing and evaluating our latest DNA sequence model powering the API. Looking forward to seeing how scientists use it! @googledeepmind
New paper from the lab from Sriram Garg in my group. We introduce a general substitution matrix for structural phylogenetics. I think this is a big deal, so read on below if you think deep history is important. academic.oup.com/mbe/advance-...
Vaughan & @tanjastadler.bsky.social develop a method to infer multitype population trajectories and apply it to MERS-CoV, revealing transmission patterns between camels and humans.
🔗 doi.org/10.1093/molbev/msaf130
#evobio #molbio #virus
FastGA: Fast Genome Alignment www.biorxiv.org/content/10.1... 🧬🖥️🧪 www.github.com/thegenemyers...
Powerful stuff from @juliosaezrod.bsky.social who found himself on the other end of the process - as a patient not a computational biology researcher - giving him insight into both research and patient perspectives. Huge credit to Julio for talking about his experiences here
Michael Ashburner FRS was an influential figure in the fields of Drosophila genomics and early sequencing database initiatives such as @ebi.embl.org.
Read about their contributions across genetics and bioinformatics in the new biographical memoir: buff.ly/f01zNat
@geneticscam.bsky.social
Preprint on "Improving spliced alignment by modeling splice sites with deep learning". It describes minisplice for modeling splice signals. Minimap2 and miniprot now optionally use the predicted scores to improve spliced alignment.
arxiv.org/abs/2506.12986