How diverse is bacterial immunity ?
We report in @science.org how language models allowed us to predict 2.4M antiphage proteins spanning >23K novel potential systems.
👏 @emordret.bsky.social, @alexhv.bsky.social & al doi.org/10.1126/scie...
Explore them here defensefinder.mdmlab.fr/wiki/refseq_...
Posts by Karel Břinda
ggCallaroo v0.1.0 is now out! This snakemake pipeline predicts, clusters and annotates bacterial genes using ggCaller, Panaroo and Bakta. It generates Panaroo files with functional annotations already integrated, which can then be used with the usual downstream tools. github.com/samhorsfield...
Two new bioinformatics internships available in @johnlees.bacpop.org group at EMBL-EBI: 1) testing and developing ML methods for identification of bacterial promoter regions; 2) Applying innovations in protein structure prediction to search massive datasets. Apply here: www.bacpop.org/jobs/
Myloasm, our long-read metagenome assembler, is now published! w/ @mgmarin.bsky.social and @lh3lh3.bsky.social
Very rewarding after > a year of development and countless hours thinking about assembly. Thanks to beta testers, Li lab, and reviewers who gave very helpful feedback.
rdcu.be/famFj
ggCaller v1.5.0 is out! We've removed the integrated clustering to enable users to benefit from new Panaroo features. Now, ggCaller generates GFFs that can be used with any clustering method. But for fans of an integrated ggCaller pangenome workflow read on...
github.com/bacpop/ggCal...
For those writing code with agents, this *excellent* article by Timo Bingmann (who wrote COBS, for kmer geeks) is super interesting on how one can conceptualise it in terms of dependencies, and how it affects development. V fun analogies (QWERTY, cooking, money)
panthema.net/2026/0318-Vi...
A really fascinating read – with ideas underlying so many current topics across different subdomains of bioinformatics.
I was pleased to give an interview to @radiopraguefr.bsky.social about my academic journey across Czechia, France, and the United States, and about my research. english.radio.cz/karel-brinda...
I got the chance to feature on this week’s BBC More or Less podcast with the excellent Tom Colls, talking about how scientists count life on Earth, specifically the microbes.
Have a listen: www.bbc.co.uk/programmes/p...
This is really interesting! I wonder how far this extends beyond isolates. MAG dynamics seems to be the (hidden) game changer with a diff regime: 1) many genomes/strains/... per single sample, 2) growing nb of MAG reconstructions per each sample, 3) many sci communities moving from isolates to mtgs.
I should have said, this takes us to about 2.8 million genomes in total. We don't have annotations, etc for the latest data yet, this will be an ongoing process
Courtesy of @martibartfast.bsky.social , we have a new release of AllTheBacteria which adds another 322,920 assemblies, covering all ENA (illumina, isolate) prokaryotes to May 2025.
allthebacteria.readthedocs.io/en/latest/ov...
How would you design a *multithreaded*, *concurrent* & *dynamic* hash table if you are focused specifically on common k-mer workloads, where streaming query & insertion are common? Jamshed, Prashant and I explore this in kache-hash, a cache-friendly k-mer hash table!
www.biorxiv.org/content/10.6...
🧵 New preprint! Our 4-lab team evolved Streptococcus pneumoniae in antibiotic-treated mice of varying immune states and discovered something surprising: bacteria rarely evolved resistance. Instead, they found a different way to survive — by rewiring RNA turnover.
🔗 www.biorxiv.org/content/10.6...
Delighted to see over 17 million new protein structure predictions from novel proteins in AllTheBacteria are now integrated into the AlphaFold Database at @ebi.embl.org !
Huge work from @gbouras13.bsky.social @oschwengers.bsky.social and friends to generate these.
www.ebi.ac.uk/about/news/u...
He may have only barely known about bacteria, and not at all about viruses, but Darwin was right about hating an ill-defined species concept
What's the best place to look up current estimates of how many truncated/non-functional genes each of us have? there was a paper from @dgmacarthur.bsky.social and co around 2014 that had an estimate from the 1000 genomes project (around 40 per person?), but I guess we have better estimates now.
We're also happy to see a second paper out today, led by Nicola de Maio, which develops methods to identify and account for mutation rate variation and recurrent errors.
www.nature.com/articles/s41...
At long last, my final PhD chapter is out: we developed a novel evolutionary simulator of bacterial pangenomes, Pansim, fitting it to data from >600K genomes using a likelihood-free framework, PopPUNK-mod, to explore neutral and adaptive pangenome dynamics www.biorxiv.org/content/10.6...
How do bacterial pangenomes evolve, what controls their dynamics, why do they exist?
Fitting a mechanistic model to 450 species from allthebacteria.org suggesting fast vs slow gene exchange (i.e. amount of MGEs) is a major differentiating factor, correlated with phylogeny rather than lifestyle
L3?
A comprehensive survey of genome language models in #bioinformatics academic.oup.com/bib/article/... 🧬🖥️🧪
I am looking for a postdoc to develop high-performance algorithms in computational genomics. Email or DM me if interested. For more information, see hlilab.github.io/vacancies. RTs appreciated!
Just came across the 2021 Turing Lecture. Has a lot of nice observations regarding the increasing gap between compute and memory bandwidth. It advocates "communication avoiding" algorithms and notes how algorithms can only be future proof if they scale with threads.
dl.acm.org/doi/10.1145/...
But it has a mathematically well-defined center (a real point) – unlike all other cities.
Congratulations @baym.lol, @brinda.eu and colleagues on the nice work, looks like a great way to identify deletions and deletion-induced fusion genes.
In MTBC, genomic deletions called "regions of difference" have long been used for phylogenetic investigation. Yet I found no citations thereof.
💻 github.com/baymlab/deletion-born-fusion-manuscript
🔧 github.com/aryakaul/prefixsuffix-kmer
Many thanks to co-authors @fernpizza.bsky.social , @brinda.eu & @baym.lol + GenScale/Baym lab! Funded by NIH, Packard, Pew, Sloan & a Chateaubriand Fellowship!
🎉 New year, NEW PREPRINT!
Bacteria exhibit astonishing genetic diversity, but where do new genes come from?
My best friend Arya Kaul (/labmate in the @baym lab) investigates how advantageous deletions can spawn new genes - "deletion-born fusions." 🧵:
New preprint from my lab (with Arya Kaul, @fernpizza.bsky.social, and @brinda.eu), in which we explore new genes hitchhiking on the beneficial deletion that fused them together, and find them in the LTEE, M. Tb/bovis, and across the bacterial tree of life
We're organising a microbes & deep learning session at SMBE next year -- looking forward to seeing your abstracts!