Bacterial rRNAs are highly modified, but their functions are often mysterious. Zac Park says that methylation of 16S rRNA by MraW(RsmH) enhances translation of structured mRNAs. Thus, mRNA structure and rRNA modifications likely co-evolved to fine-tune protein dosage.
www.biorxiv.org/content/10.6...
Posts by Steven Robbins
e.g. my last benchmark of # of plasmids geNomad pics up in LR vs SR data from the same sample is on the order of about 2.4x more. ~6x more if you use conservative identification settings--again, I assume because you can gather more markers from longer reads.
Tangentially, you detect many more viruses, plasmids, etc using long reads for the same reason. Classifiers depend on having enough context to make a definitive call, and long reads provide that. The relative benefit of "more but shorter sequences" vs "more context/LRs" is an interesting question.
Classification of random assortments of short sequences is hard, right? I believe the studies i've seen show that you can classify more long-reads as a % of the data because each read has more context. Would be interesting to see that study.
@sewerynoz.bsky.social ;-)
I think I get what you're saying. Like, it the case where you have, say, 20Gbp of both short and long-read data, you get more short reads, so could detect things at lower abundance, long-reads should give better resolution but maybe higher detection limit? I think there's another consideration...
Aren't there multiple papers showing better taxonomic profiling from long reads, right?
And several papers now have shown substantially higher recovery of HQ MAGs--consistently 2-3x more--over short-reads? Not as a function of depth, though. I've never considered depth, just want the most HQ MAGs.
Long-read multi-sample binning could clean "chimerism" up entirely?
That's not to mention it's very common these days to get single-contig complete genomes/cMAGs, especially in gut (not a difficult sample type), that would obviate this problem entirely in many cases.
Lastly, as far as I can tell at a quick read, it looks like multi-sample binning was only applied to the short-read MAGs to test single-sample vs multi-sample. In that case, the LR MAGs were single sample binned(?), which hasn't been best practice for years. In that case,...
That said, I can't see it reported for the few MAGs that contained misbinned contigs, how many bps of the MAG this made up. Were the offending contigs 0.25% of the whole MAG?
In the narrow section on chimerism where long-read (LR) MAGs were included, LR MAGs actually fared quite well--i.e. 14 of 15 MAGs at 10Gbp and 18 of 22 at 50Gb were not chimeric at all, even under that fairly low-bar for chimerism, and substantially better than short-read MAGs.
In this sense, "chimerism" means mis-binning, which will happen far more in short-read assemblies compared to long-read, simply because short contigs are more likely to misbin.
"Chimeric" here, as I read it, is defined as contig(s) from an assembled MAG covering 0.25% of a different reference genome than its other contigs. E.g: for a 4Mbp ref, 3 contigs totaling 10kb (0.25%) from a MAG match a 2nd reference=chimeric. Fair, but worth considering if that's concerning to you.
From a quick read of this paper (correct me if i'm wrong), I think it's worth pointing out that most of these results are very specific to short-read MAGs. A few comments on the narrow section where long-read MAGs were included because the details seem critical for evaluation...
We've known the gut-brain axis is a key underpinning of Parkinson's disease. Today, for the 1st time, a gut microbiome signature denoting risk found in healthy individuals with genetic predisposition
nature.com/articles/s41...
Crab evolution is so cool. Don’t mess with things that look like rocks in tide pools!!
But, really depends on the pathway, study design, what claim you’re making. There are certainly cases where differential expression of one gene may be real and interesting.
Suppose my thinking is that it makes sense to consider differential expression of multiple genes in a pathway to be multiple lines of evidence that a pathway is genuinely important.
And don’t get me started on studies without a control profile that comment on whether something is “highly expressed.” Highly expressed compared to what?
At best you could say nitrogen fixation may be “different,” without a directional assumption. More likely the nifk significance was a statistical fluke. Or, have you ever looked at a metatranscriptomic read pileup even within genes? That’ll make you question the whole method. Stacking is wild.
Feel like it depends on the pathway and the comparison set. If you’re looking at different conditions and nifk is differentially expressed, but no other nif genes, and all are require for function, can you really say “nitrogen fixation is more important in condition 2”?
I have a feeling that many metatranscritpomic studies have produced meaningless results based on the stats of a few genes. That said, looking at pathway-level enrichment stats could be useful path to meaning. A balance between false positives and negatives.
What does that mean? Especially with genes like nif for nitrogen fixation, which only function as a complex. Surely they all need to be differentially expressed to have any meaning. But we often dont enforce that criteria to derive meaning.
Interesting…this has always been an issue in metatrascriptomics that I don’t think most realize lurks there. Try finding differently expressed genes in your study and then ask the question “are other genes in the same pathway also differentially expressed?” You very often find the answer is no.
4/ The method is simple and solid:
→ compute gene-level differential signal
→ aggregate into pathways
→ compare to a null via gene randomization
2/ Gene-level stats ≠ biological interpretation.
You need pathway-level signals.
That’s where gene set enrichment comes in.
But most tools are:
• slow
• fragile with small n
• stuck in R
Most RNA-seq pipelines are doing this wrong.
They run differential expression…
and stop there.
You’re leaving biological signal on the table.
I built pygage to fix that 👇
pip install pygage
#Bioinformatics #rnaseq #computation #computationalbiology
Feels like this paper on protein-templated DNA synthesis by a natural enzyme warrants some comment.
So here's a 🧵. /1
www.science.org/doi/10.1126/...
"Pancreatic cancer mRNA vaccine shows lasting results in an early trial: Scientists caution that more research is needed, but nearly all of the patients who responded to the personalized vaccine are still alive six years later."
Screenshot of Science commentary articleL Scientists stunned by ‘fundamentally new way’ life produces DNA. Newly discovered bacterial defense system challenges genetic code’s central dogma. Image: In a newly discovered bacterial defense system, paired strands of DNA (orange and cyan) are synthesized by two enzymes: One (yellow) uses an RNA template (beige) to guide the assembly of the nucleotide bases that make up DNA, while a second, highly unusual enzyme (light blue) uses its own amino acids as a template."
As a scientist, I am here to report that this headline is indeed accurate as I am in fact STUNNED!!
Wow. This is incredible. 🧪
"Newly discovered bacterial defense system challenges genetic code’s central dogma.":
www.science.org/content/arti...
So apparently viruses are doing protein-primed DNA synthesis now