How every layer of science's "self-correcting machinery" failed when Iva Veseli and I simply wanted to reproduce the findings of a high-profile study on gut microbiome and autism:
merenlab.org/2026/04/15/u...
Posts by Gennady Gorin
I am holding out for measure theory in Vol. VII
If you use dim. reduction, you may be interested in two recent preprints we've posted on contrastive PCA:
The Rayleigh Quotient and Contrastive Principal Component Analysis I & II
w/ Maria Carilli & Kayla Jackson. They cover a lot of ground from theory to practice. 1/🧵
Latest from Shendure & Qiu labs (@cxqiu.bsky.social)
)! We combined a new 4M cell mouse whole embryo scATAC-seq atlas (E10-P0), millions of 'evolutionarily coherent' orthologs from 241 mammalian genomes (Zoonomia), and the CREsted CNN framework (@steinaerts.bsky.social).
Friendly reminder that ordinal values admit an ordering, but no notion of distance. Without a notion of distance even the concept of linear models is ill-defined. Please do not use ordinary least squares to analyze ordinal data.
For gratuitous discussion see betanalpha.github.io/assets/chapt....
Oh, fantastic. You might also be interested in www.nature.com/articles/s41..., where we go into a great deal of detail on the multimodal and technical modeling.
You know what's better than inflating the variation of your observational model to heuristically accommodate "outliers"? Actually modeling the contaminating data generating process.
Eukan: a fully automated nuclear genome annotation pipeline for less studied and divergent eukaryotes academic.oup.com/nargab/artic... 🧬💻🧪 github.com/BFL-lab/eukan
”Early Modern Memes: The Reuse and Recycling of Woodcuts in 17th-Century English Popular Print“ by @katiesisneros, on the interplay of repetition, context + meaning in woodcuts and the parallels to meme culture of today: publicdomainreview.org/essay/e...
As for ethics, that's for the ethicists. Which is to say there's a lot of ground aside from ethics and bare usefulness. In the petroleum industry just as much as the tech industry. I should hope that ground would also come into it.
Sure. Plenty of people post to massive audiences about bioplastics imminently taking over, or about photovoltaics being a dead end. Both are misinformation-adjacent, and maybe these people would be better off starting from the stronger point that the oil industry has been very useful.
Sure. So what?
Or to put it another way: I am a chemical engineer from Texas. I can go on about how useful the oil industry is and has been for 150 years or so. And I would be correct. But so what? Is that a convincing middle ground to stop at? Maybe it is. But I am not sure everyone would agree.
I do not believe there is. Because beginning and ending the discussion there (so treating every other aspect as beneath consideration or immaterial) is itself extreme.
"clever" satire in the genre of formatmypaper.com gets tedious if there is no trace of human execution beyond half an idea behind it. It's because layers of contempt for the idea and the reader are the opposite of compelling
Every paper invents its own idiosyncratic and unique analysis. In seeming contradiction, they also crib or inherit analyses, whether sensible or not, from previous papers. Perhaps someday these best practices will coalesce to be summarized in such a paper (they would have to be created first)
The Devil and Daniel Webster AI
Einstein AI C&D charging
bsky.app/profile/gori...
RIP ishmael you would have loved this 🐋
Ambient RNA & barcode swapping is a serious issue in single-cell genomics. Tools such as CellBender, scAR, DecontX & SoupX. We have developed CellSweep which is faster (in some cases by a lot) and much more accurate. Extensively tested and benchmarked. www.biorxiv.org/content/10.6... 1/
New COSIG update! 3️⃣4️⃣
I am currently looking for work, so do not hesitate to reach out if this experience sounds interesting! You can find the rest of my portfolio at gennadygorin.github.io. 13/13
This is a complex and exciting topic, and we have a lot to learn about modeling, artifact detection, and the sheer range of unexpected biology we can learn from existing experiments!
Big thanks to @lindabgoodman.bsky.social, and to the Gracheva/Bagriantsev labs that published the data! 12/
But this is a squirrel hypothalamus dataset, and here we also see the RNA coding for hypothalamic neuropeptides, distributed as in the real cells: Pomc, Sst, Npy, Agrp, and Cartpt! Perhaps a trace of RNA secretion or trafficking to the dendrites? 11/
Yet even if we do our filtering, some genes still come up non-Poisson, and many of them end up mutually correlated! We have seen this before with hemoglobins and mitochondrial genes, because their packaging is incorporated into empty drops and the RNA are captured together. 10/
Adjust your read filtering accordingly: some TSO content is fine, but feature barcoding primers are bad news.
Not all the artifacts are so easy to spot, and I go into a great deal of detail about different classes of issues. These are only a starting point. 9/
Feature barcoding primers!
Somehow 👀 the primers are missing their UMIs and antibody capture sequences. The result: vast numbers of reads with TSO/primer/FB cell barcode/poly(A). If the barcode/poly(A) is close enough to a real transcript, it gets counted, giving outliers. 8/
Some of them are TSO artifacts that happen to be similar to poly(A) regions in the transcriptome. Some are real reads that happened to get filtered out in the second pipeline. But it turns out that the vast majority come from... 7/
It turns out that some of them are pipeline artifacts! If we rerun the same dataset with a different aligner, many of these outliers disappear. Can you guess what these reads really are? 6/
But this preprint not about the molecular soup: it is about the genes that stubbornly refuse to look like soup. They are clear outliers and look nothing like Poisson. 5/
Since the statistics of empty drops are so simple, we can use them to interrogate models of technical noise. Empty drops are nearly Poisson; by investigating how much they diverge from the Poisson, we can say something about the experiment. 4/
www.biorxiv.org/content/10.1...