@cellpress.bsky.social
We submitted a presubmission inquiry on 9/12 and followed up again on 9/24. We have not heard a response. Is this typical? Could you please help us, we are trying to confirm how we should submit, as a matters arising or as a research article
www.biorxiv.org/content/10.1...
Posts by Justin Silverman
Our analysis is the largest to date, we used our newly created MUTT database which consists of over 15,000 samples, from over 30 studies, each with paired sequence counts and microbial load measurements.
Core takeaway, its important to accurately model uncertainty and error.
@ggloor.bsky.social
New Paper!
Machine learning models that attempt to predict microbial load collapse outside of their training context with an R2<0!
In contrast, our Bayesian Partially Identified Models embrace uncertainty in unmeasured microbial load and consistently outpreform.
www.biorxiv.org/content/10.1...
Excited to summarize our most recent paper, "Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2" on controlling the false discovery rate (FDR) when analyzing high throughput sequencing (HTS) data. This has been an open problem since the dawn of HTS.
New preprint!
PCR bias doesn’t just distort relative abundances—it reshapes microbiome ecological analyses.
We show that commonly used diversity metrics (e.g., UniFrac or Shannon) are not robust to amplification bias, while perturbation-invariant alternatives are.
www.biorxiv.org/content/10.1...
Thanks! We think so. I think this will help enhance the cost-effectiveness and efficiency of biomarker discovery, our methods grealy enhance positive predictive value of analyses -reducing false signals that cost money to validate and detecting true signals that would otherwise be missed.
New Paper:
We relax normalizations to produce statistical methods for bioinformatics that are much more robust and powerful. We see FDR drop from 45% to 5% with increases in power!
This adds to our ongoing work on Scale Reliant Inference.
link.springer.com/article/10.1...
Our paper explaining why Gihawi et al. failed to prove an error in the normalization used by the 2020 cancer #microbiome analysis now out as a Matters Arising in @asm.org #mSystems (w/ @george-austin.bsky.social) 🖥️ 🧬
Thread explaining the key points below.
journals.asm.org/doi/10.1128/...
@ggloor.bsky.social
Scale models are not just heuristics but have a rich theoretical foundation based on Bayesian Partially Identified Models. That theory is presented here:
arxiv.org/abs/2201.03616
We are also developing a new ALDEx3 library that is about 1000 times faster than ALDEx2 with a streamlined user interface (although its still in beta I am using it regularly)
github.com/jsilve24/ALD...
To facilitate adoption, we've update the popular ALDEx2 software package on Bioconductor to support scale model analysis.
In real data analysesd simulation studies we find our methods often lead to dramatic decreases in false positves (FDR can drop from >75% to a nominal 5%) while simultaneously maintaining or improving statistical power.
We present scale mdoels, which extend normalization by modeling potential errors in these assumptions (reducing false positives), or by allowing researchers to make more biologically plausible assumptions (reducing false negatives).
Traditional normalization methods often make implicit assumptions abou thte biological system's scale, such as microbial load or total RNA content. These assumptions can lead to false positives and negatives.
New paper in Genome Biology!
genomebiology.biomedcentral.com/articles/10....
We introduce scale models, a generalization of normalizations that explciitly account for uncertainty in biological system scale (e.g., microbial load).
🚨PA colleagues:
"Senator Fetterman wants to hear from you about how the federal funding freeze is affecting Pennsylvania."
"If your project has been impacted, please fill out our constituent impact form:" forms.office.com/g/mFv2JAPxpC
Get out your Other Support and share that info!
The National Institutes of Health had to stop considering new grant applications, delaying funding for research into diseases ranging from heart disease and cancer to Alzheimer's and allergies.
Our whole point is that there is information missing from the data -- overcoming that requires additional thought and a careful consideration of what assumptions are biologically plausible in a particular study. e.g., studying antibiotics Microbial load likely decreases post-treatment etc...
An important point if you look to benchmark our methods. Normalizations are kinda "point and click", no additional thought needed by user. We can generalize normalilzations and it helps reduce false positives. But the real advances -- when we see the massive FN/FP decreases is when care is taken.
Love it! Will deffinetly check that out as it would be super helpful for us. An yes, our methods are not yet common (thought they are available in ALDEx2 now!). Reviewers have been resistant as they love normalizations and our methods seem foreign.
Non-linear additive regression (using scalable Bayesian Multinomial Logistic Normal models) is now available in fido (on CRAN)!
neurips.cc/virtual/2024...
Also includes extreemly fast marginal likelihood estimation for hyperparameter tuning.
cran.r-project.org/web/packages...
This builds on our prior work
jmlr.org/papers/v23/1...
where we introduced the CU Sampler for Bayesian MLN models. This is even 1-2 orders of magnitude faster than those methods while still be extreemly accurate.
New paper was recently accepted to AIStats
arxiv.org/abs/2410.05548
Flexible Multinomial Logistic-Normal time series models (state space models) that scale to extreemly large datasets. Inference is 5-6 orders of magnitude faster than alternatives. R package will soon be released.
In short, we have already made public a fair amount of benchmarking studies against real data. Your manuscript just didn't cite any of it.
New results soon to be released:
We have developed specialized PIMs that account for uncertainty in sparsity assumptions. 6 datasets with ground truth, comparing against 8 methods. When our assumptions hold (first 4 datasets) our methods do well. When violated (last two) they fail gracefully.
www.nature.com/articles/s41...
Here is a completely independent group validating our methods for glycomics. Again, main conclusion -- the problem lies in normalizations and scale uncertainty is critical.