I visited Hinxton, UK for an incredible conference ( #GenomeInformatics24, thx to @pmelsted.bsky.social, @zaminiqbal.bsky.social, and Nicky Mulder). And I got to see where the magic happens.
#GenomeInformatics24
I had a blast attending the #GenomeInformatics24 conference last week! It was an honor to share some of the work I've been part of in the Bazzini Lab at the Stowers Institute. Incredible research and inspiring discussions.
Grateful for the insights and connections!
Thank you to the #genomeinformatics24 organisers for the opportunity to attend and to present my poster on De Bruijn graph representations of metagenomes (doi.org/10.6084/m9.f...). Had great chats about the future of metagenome assembly, DBgs, and k-mer/unitig indices
One of the pieces of art on the Wellcome campus, a mirrored sphere, polished to reflect the viewer and the background, celebrating life, “to complement the data under research”.
Thanks to @zaminiqbal.bsky.social , @pmelsted.bsky.social and Nicky Mulder for organising the excellent #GenomeInformatics24 conference at the Wellcome campus, there were lots of really useful talks and posters 🧬🖥️
There are many small downstream ORFs in UTR regions in human genome. Daniel Aldas Bulas explained the lab work they did to confirm with GFP reporters that at least 5000 of them are translated, and it seems 1/3 of genes in human genome have translating downstream ORFs. #GenomeInformatics24
Here's our #GenomeInformatics24 poster on evaluating predictors of coding regions on raw reads (rather than on assembled genomes) doi.org/10.6084/m9.f... Why might you want to choose a predictor of coding regions for reads?
We finished yesterday’s #GenomeInformatics24 with @gaetanbenoit.bsky.social describing metagenome assembler MetaMDBG now for nanopore data NanoMDBG. Using chained minimisers, piled up to detect errors.
Daniel Anderson does AMR gene copy number detection from long reads without assembly by building gene context graphs in software “Amira” to resolve copy number. Uses a reference pan genome to provide context genes. #GenomeInformatics24
Nicholas Maurice is writing Mapler, a tool for assessing the quality of metagenomes (not just the quality of individual bins). Unmapped reads have rarer k-mers (so deeper coverage might help) but that’s not the whole story #GenomeInformatics24
Daria Frolova explains the complexity of finding an edit distance between plasmids with double cut and join events and indels, to construct phylogeny. Using Jaccard, containment and DCJ-indel from her software “Pling”. #GenomeInformatics24
Gang Fang talked about his lab’s work making use of Pacbio sequencing’s ability to detect methylation. Methyl’n in bacteria is stable (found at certain motifs) so can be used to bin contigs in metagenome, group plasmid seq with host, or enrich for rare bacteria when sequencing. #GenomeInformatics24
If you have expression data from different conditions @wkhuber.bsky.social has LEMUR (Latent embedding multivariate regression) to help show what the expression data would be under other conditions by moving between latent spaces #GenomeInformatics24
Ananyo Choudhury points out in his talk about variant discovery in human genomes that African genomes have mostly been sampled from African diaspora in US and these people mostly came from just a few regions of Africa with the rest being underrepresented #GenomeInformatics24
Incorporating genomes from diverse populations is the simplest and most reliable way to assess the functional impact of variants. Ananyo Choudhury
#GenomeInformatics24
Novel SNP discovery differs hugely between African populations, Ananyo Choudhury.
Can determine whether putative deleterious mutations really are by exploring these.
#GenomeInformatics24
How far we have come! Ananyo Choudhury, at
#GenomeInformatics24
Excellent talk from Esther Woo who works with an inbred fish population to do GWAS and then used neural nets in an encoder-decoder model to do auto-phenotyping from images of the fish and do GWAS on the embedding space. And probing meaning of the latent variables by perturbing. #GenomeInformatics24
Great talk from Esther Yoo from the @ewanbirney.bsky.social group at EBI on doing GWAS in latent space from neural networks to explore phenotypes in Medaka. Lots of questions on the method, applicability, v well handled!
#GenomeInformatics24
Alexandrina Pancheva experiments with building a LLM for nucleotide seqs with single nucleotides as tokens, in order to explore the effect of mutations in the sequence, scoring log likelihoods and inspecting regions with unusual scores. #GenomeInformatics24
Here's a link to our #GenomeInformatics24 poster on "Finding the most diverse subset of proteins" that was presented yesterday. doi.org/10.6084/m9.f... Why do we want to find the most diverse set of proteins?
From yesterday’s talks at #GenomeInformatics24 I ran out of time to skeet about all of them, but I did enjoy Roland Faure’ sequence sketching that shortened the sequence while still allowing existing bioinf tools to be used on the shortened sequence. But made clear that seq errors were amplified.
Appreciate all the highlights from #GenomeInformatics24 - wish I could be there this year!
Jasmijn Baaijens talking about selecting amplicons and primers to profile viral lineages in metagenomes (AmpliDiff).
#GenomeInformatics24
Can Alkan talking about hardware-software co-design for accelerating read mapping
#GenomeInformatics24
Logan, a 31-mer based search engine for sequences in SRA was just presented by @pierrepeterlongo.bsky.social and it is huge. 2 centuries CPU time to make the bloom filters. Will be available for 6 months for now. #GenomeInformatics24
Caitlin Collins estimating rates of gain/loss of accessory genes in bacteria and integrating the information with timed phylogenies. First time I've seen this kind of thing, comparing across species. Great talk
#GenomeInformatics24
Nice talk by Jonathan Mudge about how hard it is to lift gene annotations from one human genome to another, especially for certain tricky cases. The USP17L gene family example looks pretty wild. #GenomeInformatics24
Now, the start of the pangenome session, a talk from session chair Jana Ebler, showing how you can use a pangenome of known human haplotypes to do short read haplotyping. #GenomeInformatics24
Camille Marchet opened #genomeinformatics24 and summarised recent advances in storage and indexing with k-mers. It’s got me thinking about uses of locations of minimisers, so now I need to read more.