How diverse is bacterial immunity ?
We report in @science.org how language models allowed us to predict 2.4M antiphage proteins spanning >23K novel potential systems.
👏 @emordret.bsky.social, @alexhv.bsky.social & al doi.org/10.1126/scie...
Explore them here defensefinder.mdmlab.fr/wiki/refseq_...
Posts by John Lees
Alex Kramer, Alan Zhang and friends posted our preprint today. In it, we introduce Panmap, a tool for phylogenetic placement, assembly, lineage abundance estimation, and eDNA assignment using phylogenetic pangenomes.
www.biorxiv.org/content/10.6...
It might be that transcriptional differences do indeed cause a difference to invasion, but I would expect this to be reflected in the genome too (e.g. eQTL/SNP causing regulatory differences)
Thanks!
I would be surprised if there were two isogenic strains which due to environment alone caused more invasive disease (due to speed of transcription change vs outbreak timing, simplicity of regulation).
i.e. consistent with outbreak with single source, no hypermutation/highly variable regions
Minor update on genomics of the menB outbreak. UKHSA have updated with four more outbreak isolates, and a high quality assembly of the earlier genome.
tl;dr ran ska+gubbins again, same results, all of these genomes look basically identical
www.bacpop.org/blog/menb_up...
A quick rant on people vibe-translating our Rust libraries to other languages
That's the second time in a week that I see new bioinformatics tools with a vibe-coded translation of our Rust libraries to C/C++.
I have two major issues with that:
...because I'm also releasing ggCallaroo! This is an-easy-to-use snakemake pipeline that runs ggCaller, Panaroo and then functionally-annotates representative proteins using Bakta, all in one tool.
github.com/samhorsfield...
ok sold! We will try it at some point soon then
For these approaches which build from a longer sequence (and use the order of k-mers in construction), I presume this wouldn't really work for e.g. short read data being read 'online' this way?
Or have I misunderstood?
This looks very clever
A use we have a lot is filtering singletons from fastq reads. We probe the filter for each k-mer, and if not found add to the filter, if found add to the passed set
We have two paid internships available in our group starting this summer, suitable for master's / pre-PhD students:
- Protein structure search
- Promoter variation
For full details and how to apply see: www.bacpop.org/jobs/
Welcome, @stephlo.bsky.social, our new Team Leader for Protein Function.
Find out how Stephanie’s experience in genomics shaped her approach to protein curation and AI integration.
#ProteinScience #AI
www.embl.org/news/people-...
Here is some initial population analysis on the menB outbreak genome we have done:
www.bacpop.org/blog/menb/
Some recombination in e.g. pilus and porB, but nothing I can see that is hugely unexpected
Metagenomics might be nice too but I don’t think it’s done unless the infection is unknown. More sequences from the outbreak, followed by long read sequencing of the isolates would be the things I’d find most useful right now
There is an assembly, just not the reads (as far as I could tell). I was pleased that this was released quickly and openly — not sure that would have happened pre-Covid. Reads would be nice to but realistically I think we (including EBI) need to make it easier/faster to share them in outbreaks
Some provisional analysis on one of the genomes from the menB outbreak detailed below:
johnlees.me/posts/menb-o...
Some more thoughts on why this outbreak (large, happening quickly, in one place) might be happening:
johnlees.me/posts/menb-o...
Perhaps strain + immunity + high transmission
That's really interesting, thank you
Thanks, yes phase variation is a good point, definitely a plausible mechanism for genetically more likely invasion
That OR seems massive though – do you have a reference/evidence for that?
From our work on this we definitely saw phase variation associated with invasion, but not to that extent
Probably nothing particularly novel there but was helpful in thinking about why this is happening, whether we should be worried etc
As usual the Science Media Centre has done a good job of collating scientific input: www.sciencemediacentre.org/expert-react...
I did my PhD on bacterial meningitis, finding whether there are genetic factors which make meningitis more likely.
Wrote down some initial thoughts on the current outbreak in Kent: johnlees.me/posts/menb-o...
I tried to think of factors and their likelihood to explain why this is happening now
Thx to @samuelhorsfield.bsky.social @jackietoussaint.bsky.social @theo.io @zaminiqbal.bsky.social @gerrythill.bsky.social
You can now view a tree of 2,399,238 bacterial genomes we made from AllTheBacteria (on the great Taxonium):
taxonium.org/atb
That's a big tree!
(unless you're used to SC2 trees)
New preprint where @lbobay.bsky.social and I were motivated by the fact that non-synonymous substitutions are commonly analyzed in molecular evolution studies, but the similarity of the amino acids being substituted is an understudied area.
doi.org/10.64898/202...
New paper showing that bacteria with more genes for cooperation can live in a broader range of habitats and that genes for cooperation are more more likely to be in the accessory genome www.pnas.org/doi/10.1073/... @lauriebelch.bsky.social
except my tokens keep running out
wow what a backdrop!
Delighted to see over 17 million new protein structure predictions from novel proteins in AllTheBacteria are now integrated into the AlphaFold Database at @ebi.embl.org !
Huge work from @gbouras13.bsky.social @oschwengers.bsky.social and friends to generate these.
www.ebi.ac.uk/about/news/u...
Congratulations! (and also happy to know these cryptic messages are so universal. Do I mean happy really? Well it made me feel better about them at leasy)