Justin Silverman (@inschool4life) Bsky

Uncertainty Modeling Outperforms Machine Learning for Microbiome Data Analysis Microbiome sequencing measures relative rather than absolute abundances, providing no direct information about total microbial load. Normalization methods attempt to compensate, but rely on strong, of...

@cellpress.bsky.social
We submitted a presubmission inquiry on 9/12 and followed up again on 9/24. We have not heard a response. Is this typical? Could you please help us, we are trying to confirm how we should submit, as a matters arising or as a research article
www.biorxiv.org/content/10.1...

6 months ago 4 0 0 1

Our analysis is the largest to date, we used our newly created MUTT database which consists of over 15,000 samples, from over 30 studies, each with paired sequence counts and microbial load measurements.

Core takeaway, its important to accurately model uncertainty and error.

@ggloor.bsky.social

7 months ago 0 1 0 0

Uncertainty Modeling Outperforms Machine Learning for Microbiome Data Analysis Microbiome sequencing measures relative rather than absolute abundances, providing no direct information about total microbial load. Normalization methods attempt to compensate, but rely on strong, of...

New Paper!

Machine learning models that attempt to predict microbial load collapse outside of their training context with an R2<0!

In contrast, our Bayesian Partially Identified Models embrace uncertainty in unmeasured microbial load and consistently outpreform.

www.biorxiv.org/content/10.1...

7 months ago 7 3 1 0

Excited to summarize our most recent paper, "Explicit Scale Simulation for analysis of RNA-sequencing count data with ALDEx2" on controlling the false discovery rate (FDR) when analyzing high throughput sequencing (HTS) data. This has been an open problem since the dawn of HTS.

8 months ago 6 3 1 0

PCR Bias Impacts Microbiome Ecological Analyses Polymerase Chain Reaction (PCR) is a critical step in amplicon-based microbial community profiling, allowing the selective amplification of marker genes such as 16S rRNA from environmental or host-ass...

New preprint!

PCR bias doesn’t just distort relative abundances—it reshapes microbiome ecological analyses.

We show that commonly used diversity metrics (e.g., UniFrac or Shannon) are not robust to amplification bias, while perturbation-invariant alternatives are.

www.biorxiv.org/content/10.1...

8 months ago 2 0 0 0

Thanks! We think so. I think this will help enhance the cost-effectiveness and efficiency of biomarker discovery, our methods grealy enhance positive predictive value of analyses -reducing false signals that cost money to validate and detecting true signals that would otherwise be missed.

8 months ago 1 0 0 0

Replacing normalizations with interval assumptions enhances differential expression and differential abundance analyses - BMC Bioinformatics Background Methods for differential expression and differential abundance analysis often rely on normalization to address sample-to-sample variation in sequencing depth. However, normalizations imply ...

New Paper:

We relax normalizations to produce statistical methods for bioinformatics that are much more robust and powerful. We see FDR drop from 45% to 5% with increases in power!

This adds to our ongoing work on Scale Reliant Inference.

link.springer.com/article/10.1...

9 months ago 3 0 1 0

Our paper explaining why Gihawi et al. failed to prove an error in the normalization used by the 2020 cancer #microbiome analysis now out as a Matters Arising in @asm.org #mSystems (w/ @george-austin.bsky.social) 🖥️ 🧬

Thread explaining the key points below.

journals.asm.org/doi/10.1128/...

11 months ago 8 3 0 0

@ggloor.bsky.social

11 months ago 0 0 0 0

Scale Reliant Inference Many scientific fields, including human gut microbiome science, collect multivariate count data where the sum of the counts is unrelated to the scale of the underlying system being measured (e.g., tot...

Scale models are not just heuristics but have a rich theoretical foundation based on Bayesian Partially Identified Models. That theory is presented here:

arxiv.org/abs/2201.03616

11 months ago 2 0 1 0

GitHub - jsilve24/ALDEx3 Contribute to jsilve24/ALDEx3 development by creating an account on GitHub.

We are also developing a new ALDEx3 library that is about 1000 times faster than ALDEx2 with a streamlined user interface (although its still in beta I am using it regularly)
github.com/jsilve24/ALD...

11 months ago 1 0 1 0

GitHub - jsilve24/ALDEx3 Contribute to jsilve24/ALDEx3 development by creating an account on GitHub.

To facilitate adoption, we've update the popular ALDEx2 software package on Bioconductor to support scale model analysis.

11 months ago 1 0 1 0

In real data analysesd simulation studies we find our methods often lead to dramatic decreases in false positves (FDR can drop from >75% to a nominal 5%) while simultaneously maintaining or improving statistical power.

11 months ago 1 0 1 0

We present scale mdoels, which extend normalization by modeling potential errors in these assumptions (reducing false positives), or by allowing researchers to make more biologically plausible assumptions (reducing false negatives).

11 months ago 1 0 1 0

Traditional normalization methods often make implicit assumptions abou thte biological system's scale, such as microbial load or total RNA content. These assumptions can lead to false positives and negatives.

11 months ago 2 0 1 0

Incorporating scale uncertainty in microbiome and gene expression analysis as an extension of normalization - Genome Biology Statistical normalizations are used in differential analyses to address sample-to-sample variation in sequencing depth. Yet normalizations make strong, implicit assumptions about the scale of biologic...

New paper in Genome Biology!

genomebiology.biomedcentral.com/articles/10....

We introduce scale models, a generalization of normalizations that explciitly account for uncertainty in biological system scale (e.g., microbial load).

11 months ago 8 3 2 0

Microsoft Forms

🚨PA colleagues:

"Senator Fetterman wants to hear from you about how the federal funding freeze is affecting Pennsylvania."

"If your project has been impacted, please fill out our constituent impact form:" forms.office.com/g/mFv2JAPxpC

Get out your Other Support and share that info!

1 year ago 122 131 4 6

NIH funding freeze stalls applications on $1.5 billion in medical research funds The National Institutes of Health had to stop considering new grant applications, delaying funding for research into diseases ranging from heart disease and cancer to Alzheimer's and allergies.

The National Institutes of Health had to stop considering new grant applications, delaying funding for research into diseases ranging from heart disease and cancer to Alzheimer's and allergies.

1 year ago 3568 1479 161 131

Our whole point is that there is information missing from the data -- overcoming that requires additional thought and a careful consideration of what assumptions are biologically plausible in a particular study. e.g., studying antibiotics Microbial load likely decreases post-treatment etc...

1 year ago 1 0 0 0

An important point if you look to benchmark our methods. Normalizations are kinda "point and click", no additional thought needed by user. We can generalize normalilzations and it helps reduce false positives. But the real advances -- when we see the massive FN/FP decreases is when care is taken.

1 year ago 0 0 1 0

Love it! Will deffinetly check that out as it would be super helpful for us. An yes, our methods are not yet common (thought they are available in ALDEx2 now!). Reviewers have been resistant as they love normalizations and our methods seem foreign.

1 year ago 2 0 1 0

NeurIPS Efficient Bayesian Additive Regression Models For Microbiome and Gene Expression StudiesNeurIPS 2024

Non-linear additive regression (using scalable Bayesian Multinomial Logistic Normal models) is now available in fido (on CRAN)!
neurips.cc/virtual/2024...

Also includes extreemly fast marginal likelihood estimation for hyperparameter tuning.
cran.r-project.org/web/packages...

1 year ago 3 0 0 0

This builds on our prior work
jmlr.org/papers/v23/1...
where we introduced the CU Sampler for Bayesian MLN models. This is even 1-2 orders of magnitude faster than those methods while still be extreemly accurate.

1 year ago 0 0 0 0

Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative fr...

New paper was recently accepted to AIStats

arxiv.org/abs/2410.05548

Flexible Multinomial Logistic-Normal time series models (state space models) that scale to extreemly large datasets. Inference is 5-6 orders of magnitude faster than alternatives. R package will soon be released.

1 year ago 2 0 1 0

Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models Many scientific fields collect longitudinal count compositional data. Each observation is a multivariate count vector, where the total counts are arbitrary, and the information lies in the relative fr...

Here is a better link to the new paper:

arxiv.org/abs/2410.05548

1 year ago 0 0 0 0

In short, we have already made public a fair amount of benchmarking studies against real data. Your manuscript just didn't cite any of it.

1 year ago 0 0 1 0

New results soon to be released:

We have developed specialized PIMs that account for uncertainty in sparsity assumptions. 6 datasets with ground truth, comparing against 8 methods. When our assumptions hold (first 4 datasets) our methods do well. When violated (last two) they fail gracefully.

1 year ago 1 0 1 0

Compositional data analysis enables statistical rigor in comparative glycomics - PubMed Comparative glycomics data are compositional data, where measured glycans are parts of a whole, indicated by relative abundances. Applying traditional statistical analyses to these data often results ...

www.nature.com/articles/s41...

Here is a completely independent group validating our methods for glycomics. Again, main conclusion -- the problem lies in normalizations and scale uncertainty is critical.

1 year ago 0 0 1 0

Vaginal metatranscriptome meta-analysis reveals functional BV subgroups and novel colonisation strategies - PubMed Our findings highlight a need to focus on functional rather than taxonomic differences when considering the role of microbiomes in disease and identify pathways for further research as potential BV tr...

pubmed.ncbi.nlm.nih.gov/39709449/

Another real data validation. @ggloor.bsky.social said he sat on this data for almost 10 years because existing methods were given nonsensical answers. Only when uncertainty in scale was considered did things start to make sense.

1 year ago 2 2 1 0

Explicit Scale Simulation for analysis of RNA-sequencing with ALDEx2 In high-throughput sequencing (HTS) studies, sample-to-sample variation in sequencing depth is driven by technical factors, and not by variation in the scale (e.g., total size, microbial load, or tota...

Here @ggloor.bsky.social found our methods drastically improve metatranscriptomic analyses as well -- again real data analyses but less of a focus on benchmarking.

www.biorxiv.org/content/10.1...

1 year ago 6 2 1 0

Posts by Justin Silverman