Advertisement · 728 × 90

Posts by Botond Sipos

The AI Rewrite Dilemma

Blog post on "The AI Rewrite Dilemma": lh3.github.io/2026/04/17/t...

4 days ago 54 29 3 4

Amos was not only a giant of bioinformatics and biocuration, but one of the nicest people I've met in academia. His support and advice were invaluable when we were establishing @bgee.org, and I will always remember how warmly he welcomed us to @sib.swiss when I arrived in Switzerland 20 years ago.

4 months ago 2 1 0 0
Intro to Bedder – The Quinlan Lab

We are thrilled to announce the first official release (v0.1.8) of #𝗯𝗲𝗱𝗱𝗲𝗿, the successor to one of our flagship tool, #𝗯𝗲𝗱𝘁𝗼𝗼𝗹𝘀! Based on ideas we conceived of long ago (!), this was achieved thanks to the dedication of Brent Pedersen.

1/n

4 months ago 298 152 5 11

Displacement-Optimized Tanglegrams for Trees and Networks www.biorxiv.org/content/10.1101/2025.11....

4 months ago 0 1 0 0
https://link.springer.com/article/10.1007/s00239-025-10277-1
Conceptual overview of hierarchical orthologous groups. An example of one HOG, or gene family. A Species tree with four taxa: plant (green), fish (blue), human (orange), and mouse (yellow), each with one or more genes. B The implied gene tree, dubbed “HOG tree,” and inferred nested HOG composition. Duplication nodes (red) can be deduced based on the species tree topology and clusters of homologous genes at each level. Ancestral genes from which the HOGs descended are shown in gray. C HOGs returned at different taxonomic levels. Consider a gene family that was present in the last eukaryotic common ancestor (LECA). At this level, a single HOG encompasses all genes descending from that ancestral gene. At the Vertebrata level, this gene underwent duplication, leading to two distinct copies, i.e., HOGs. At the Mammalia level, a second duplication further subdivides one of these HOGs, showing how deeper HOGs split into nested subHOGs at more recent levels. The HOG composition implies that a loss event occurred after the mammalian speciation

https://link.springer.com/article/10.1007/s00239-025-10277-1 Conceptual overview of hierarchical orthologous groups. An example of one HOG, or gene family. A Species tree with four taxa: plant (green), fish (blue), human (orange), and mouse (yellow), each with one or more genes. B The implied gene tree, dubbed “HOG tree,” and inferred nested HOG composition. Duplication nodes (red) can be deduced based on the species tree topology and clusters of homologous genes at each level. Ancestral genes from which the HOGs descended are shown in gray. C HOGs returned at different taxonomic levels. Consider a gene family that was present in the last eukaryotic common ancestor (LECA). At this level, a single HOG encompasses all genes descending from that ancestral gene. At the Vertebrata level, this gene underwent duplication, leading to two distinct copies, i.e., HOGs. At the Mammalia level, a second duplication further subdivides one of these HOGs, showing how deeper HOGs split into nested subHOGs at more recent levels. The HOG composition implies that a loss event occurred after the mammalian speciation

https://link.springer.com/article/10.1007/s00239-025-10272-6
Summary of the QfO8 meeting. a Hot topics and future directions in method development and applications within the QfO community, namely artificial intelligence, protein domains, protein structure, RNA and splicing isoforms. b Definition of orthology and paralogy, including various paralogous subtypes (e.g. in-paralogs and out-paralogs). c Duplications and functional divergence. d Applications of orthology

https://link.springer.com/article/10.1007/s00239-025-10272-6 Summary of the QfO8 meeting. a Hot topics and future directions in method development and applications within the QfO community, namely artificial intelligence, protein domains, protein structure, RNA and splicing isoforms. b Definition of orthology and paralogy, including various paralogous subtypes (e.g. in-paralogs and out-paralogs). c Duplications and functional divergence. d Applications of orthology

https://link.springer.com/article/10.1007/s00239-025-10271-7
Overview of the OrthoXML File Format (simplified). A schematic representation of an OrthoXML file, a standardized XML-based format for representing orthology data. OrthoXML follows a hierarchical structure where elements are enclosed within opening < tag > and closing </tag > tags. < orthoXML > is the root element enclosing other elements. The < species > element contains information about genes. An OrthoXML file can include a < taxonomy > element, which specifies the species tree used to generate the file. Additionally, the < groups > element encapsulates the orthology and paralogy relationships among genes

https://link.springer.com/article/10.1007/s00239-025-10271-7 Overview of the OrthoXML File Format (simplified). A schematic representation of an OrthoXML file, a standardized XML-based format for representing orthology data. OrthoXML follows a hierarchical structure where elements are enclosed within opening < tag > and closing </tag > tags. < orthoXML > is the root element enclosing other elements. The < species > element contains information about genes. An OrthoXML file can include a < taxonomy > element, which specifies the species tree used to generate the file. Additionally, the < groups > element encapsulates the orthology and paralogy relationships among genes

Our trilogy of orthology publications is online!
Review on Hierarchical Orthologous Groups doi.org/10.1007/s00239-025-10277-1

OrthoXML-Tools doi.org/10.1007/s00239-025-10271-7

A great community effort on Quest for Orthologs in the era of Data Deluge and AI doi.org/10.1007/s00239-025-10272-6

5 months ago 19 10 1 0

Great work by Nicola De Maio and Nick Goldman - not just scaleable to "pandemic scale" trees but - if I have got this right - arguably more valid than traditional column based bootstrap in the context of very tight evolution.

5 months ago 10 2 0 0
Post image

Yes - isometric scaling as a way to understand the benefits and costs of being small versus large. Haldane's Harpers article from 1926 is an amazing example of popular science writing.

5 months ago 22 6 1 0
Post image

Can an AI tool help us better understand the origins of cancer?

Researchers from EMBL's Korbel Group have developed a new AI method – MAGIC – which, through a game of molecular laser tag, is shedding light on how chromosomal abnormalities form in cells.

www.embl.org/news/science...

5 months ago 13 5 0 0
Annotating the genome at single-nucleotide resolution with DNA foundation models - Nature Methods By leveraging the power of pretrained DNA foundation models, SegmentNT achieves performant genome annotation through segmenting different genic and regulatory elements.

#Annotating the genome at single-nucleotide resolution with #DNA foundation models www.nature.com/articles/s41...

5 months ago 1 0 0 0
Preview
Unlocking the regulatory code of RNA: launching the Human RNome Project | Genome Biology | Full Text The human RNome, the complete set of RNA molecules in human cells, arises through complex processing and includes diverse molecular species. While research traditionally focuses on four canonical nucleotide residues, the RNome, encompassing over 180 distinct modifications across organisms, with at least 50 in humans, is increasingly recognized. These modifications play critical roles in regulating RNA structure, stability, and function, yet the rules linking their precise locations to biological outcomes remain poorly defined. The Human RNome Project aims to map all RNA modifications, build essential resources, and harness new technologies to transform RNA biology, therapeutic development, agriculture, and even data storage.

Unlocking the regulatory code of #RNA: launching the Human #RNome Project genomebiology.biomedcentral.com/articles/10....

5 months ago 0 0 0 0
Advertisement

I am genuinely impressed by large language models - they can absorb disparate components of text into some consolidated view, they can produce extremely good language and - with the right model - translate pretty well between languages and they are an excellent text based UI for humans to use. But..

5 months ago 73 22 2 9
Preview
OpenAI and Anthropic v app developers: tech’s Cronos syndrome Will the labs devour the apps that run on their models?

Think of AI labs as Cronos, a titan in Greek mythology, trying to devour his children. The question, as with Cronos, is: can the little ones survive and fight back?

5 months ago 2 3 0 0

Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.

5 months ago 20 15 2 0
Preview
AGNES: Adaptive Graph Neural Network and Dynamic Programming Hybrid Framework for Real-Time Nanopore Seed Chaining Nanopore sequencing enables real-time long-read DNA sequencing with reads exceeding 10 kilobases, but inherent error rates of 12-15 percent present significant computational challenges for read alignm...

AGNES: Adaptive Graph Neural Network and Dynamic Programming Hybrid Framework for Real-Time #Nanopore Seed Chaining arxiv.org/abs/2510.16013

5 months ago 0 0 0 0
Post image

Full comic here: www.smbc-comics.com/comic/signal-4 #smbc

6 months ago 210 33 9 5
Preview
Nucleic acids and proteins Big complex molecules are the unique stuff of life. This is how they work

Biological life depends on two families of large molecule: nucleic acids and proteins. The first of our collection of primers explains what they are and how they work

6 months ago 15 4 1 1

I am looking to get my hands on some #Illumina 5-base methylation data - does anyone have a bam file that I could use for some testing? Please RT for reach!

6 months ago 3 5 2 0
Bayesian probability, like frequentist probability, is a model-based activity that is mathematically anchored by physical randomization at one end and calibration to a reference set at the other | St...

Bayesian probability, like frequentist probability, is a model-based activity that is mathematically anchored by physical randomization at one end and calibration to a reference set at the other
statmodeling.stat.columbia.edu/2025/10/20/b...

6 months ago 8 3 0 0
Advertisement
Illustration of Burrows-Wheeler Transform and many auxiliary structures from the input string how$now$brown$cow$#

Illustration of Burrows-Wheeler Transform and many auxiliary structures from the input string how$now$brown$cow$#

New tool "bwt-svg" for making illustrations of the BWT and the many auxiliary arrays and other structures related to it. Pyodide-based no-installation-necessary interface here: benlangmead.github.io/bwt-svg/. (H/t to @robert.bio for pointing me to pyodide!) Full repo: github.com/benlangmead/....

6 months ago 40 21 4 1
Preview
Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing Abstract. Polyadenylation is a dynamic process that is important in cellular physiology, which has implications in messenger RNA decay rates, translation e

A paper from @lachlanjmc.bsky.social Lachlan Coin, not active here for the past month, on Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing academic.oup.com/gigascience/...

7 months ago 9 2 0 0
Preview
Systematic benchmarking of basecalling models for RNA modification detection with highly multiplexed nanopore sequencing Nanopore direct RNA sequencing (DRS) holds promise for advancing our understanding of the epitranscriptome by detecting RNA modifications in native RNA molecules. Recently, Oxford Nanopore Technologie...

🚨 New preprint alert 🚨
We systematically benchmarked @nanoporetech.com 's modification-aware basecalling models released for RNA on sets of in vitro and in vivo sequences and made some curious observations 🧬🔍.
bit.ly/4lXqNul
Follow along for a little recap (1/12)

9 months ago 43 22 1 1

Claus Wilke on Alphafold and the problem of protein folding in 2025

9 months ago 18 6 0 0
Preview
Go 1.25 interactive tour Fake clock, new GC, flight recorder and more.

Go 1.25 interactive tour

Go 1.25 is scheduled for release in August, so it's a good time to explore what's new.
#golang

antonz.org/go-1-25/

9 months ago 6 2 0 0

Excited to launch our AlphaGenome API goo.gle/3ZPUeFX along with the preprint goo.gle/45AkUyc describing and evaluating our latest DNA sequence model powering the API. Looking forward to seeing how scientists use it! @googledeepmind

9 months ago 219 82 5 9
Preview
A general substitution matrix for structural phylogenetics. Abstract. Sequence-based maximum likelihood (ML) phylogenetics is a widely used method for inferring evolutionary relationships, which has illuminated the

New paper from the lab from Sriram Garg in my group. We introduce a general substitution matrix for structural phylogenetics. I think this is a big deal, so read on below if you think deep history is important. academic.oup.com/mbe/advance-...

10 months ago 96 52 3 2
Preview
Bayesian Phylodynamic Inference of Multitype Population Trajectories Using Genomic Data Abstract. Phylodynamic methods provide a coherent framework for the inference of population parameters directly from genetic data. They are an important to

Vaughan & @tanjastadler.bsky.social develop a method to infer multitype population trajectories and apply it to MERS-CoV, revealing transmission patterns between camels and humans.

🔗 doi.org/10.1093/molbev/msaf130

#evobio #molbio #virus

10 months ago 9 5 0 0
Post image Post image Post image Post image

FastGA: Fast Genome Alignment www.biorxiv.org/content/10.1... 🧬🖥️🧪 www.github.com/thegenemyers...

10 months ago 26 9 1 1

Powerful stuff from @juliosaezrod.bsky.social who found himself on the other end of the process - as a patient not a computational biology researcher - giving him insight into both research and patient perspectives. Huge credit to Julio for talking about his experiences here

10 months ago 31 10 0 0
Advertisement
Post image

Michael Ashburner FRS was an influential figure in the fields of Drosophila genomics and early sequencing database initiatives such as @ebi.embl.org.

Read about their contributions across genetics and bioinformatics in the new biographical memoir: buff.ly/f01zNat

@geneticscam.bsky.social‬

10 months ago 28 19 3 0
Post image

Preprint on "Improving spliced alignment by modeling splice sites with deep learning". It describes minisplice for modeling splice signals. Minimap2 and miniprot now optionally use the predicted scores to improve spliced alignment.
arxiv.org/abs/2506.12986

10 months ago 112 54 0 2