sorry, I meant min, not max
Posts by Gregor Sturm
Thanks for the response!
The thing is that the original TCRdist uses max(4, 4-score) so it caps scores at 4.
With TCRblosum that would be max(2, 2-score) which would destroy all the strong signal that you have in e.g. the C residue.
So you'd suggest to use 2-score without the max instead?
Hi @pmeysman.bsky.social,
we are trying to integrate tcrBLOSUM into scirpy, our library for scTCRseq analysis.
Specifically, we want to adapt the TCRdist algorithm to use tcrBLOSUM substitution values.
How did you turn the substitution matrix into a distance matrix in your paper?
There's another scverse conference this year and it will be amazing!
Register now: www.eventbrite.com/e/scverse-co...
AFAIK, these differences are minor, numeric differences. I would consider them equivalent.
Our benchmark + guidelines for atlas-level differential gene expression of single cells is online:
academic.oup.com/bib/article/...
Bottom line: Use pseudobulk + DESeq2 in simple and pseudobulk + DREAM in more complex settings.
Collab w/ @leonhafner.bsky.social @itisalist.bsky.social
Register now for the best conference of the year!
📣 Mark your calendars! The 2025 edition of the scverse conference will take place on 17-19 November at Stanford University (US) scverse.org/conference20...
Call for abstracts and registrations coming soon!
Just released a new version of the @scverse.bsky.social cookiecutter template: github.com/scverse/cook...
Some highlights:
🔃 improved template sync (merge conflicts now show up as such)
🚀 use hatch as project manager
🔧 lots of fixes and documentation updates
Nice post!
How did you generate the doi-link for a blog post?
Blog post by @const-ae.bsky.social with a simple explanation of the manifold regression algorithm & code that underlies our paper “Analysis of multi-condition single-cell data with latent embedding multivariate regression” (doi.org/10.1002/eji....).
const-ae.name/post/2025-01...
Just released scirpy v0.21 -- Now with GPU Support for Hamming sequence distance and a brand new tutorial for working with scTCR datasets >1M cells: scirpy.scverse.org/en/latest/tu...
@scverse.bsky.social
🎉 Scanpy 1.11.0 is out! 🎉 just after reaching 2000 stars on GitHub!
- sc.pp.sample replaces subsample with many new features
- Sparse Dask support pca
- session-info2 package for more reproducible notebooks
See the release notes:
Been looking forward to this talk since @alexpeltzer.bsky.social told me about DSO in October!
I'd like to share DSO, a command line helper to build reproducible data science projects with ease.
It is an opinionated way to organize data science projects, built around data version control (DVC).
github.com/Boehringer-I...
We try to avoid that by using this with preprocessed data only. All the heavy lifting is done with nextflow pipelines before. Datasets up to tens of GBs have worked well so far.
Finally, many thanks to my colleagues @alexpeltzer.bsky.social, Daniel Schreyer and Tom Schwarzl for testing, adopting, and contributing to DSO.
If you want to learn more, I'll be presenting this at a @nf-co.re bytesize talk: nf-co.re/events/2025/...
We built this at @boehringerglobal.bsky.social to meet the quality standards required for biomarker analysis in clinical trials.
But I think this is useful for any kind of data analysis project.
An exemplary PCA plot with a "preliminary" watermark.
One of my favorite features: automated watermarking of all plots in a quarto report. Nobody gonna publish my plots anymore before I think they are ready.
It brings together the best tools:
- git, for code versioning
- dvc, for data versioning and tracking inputs and outputs
- jinja2, for templates
- uv, for Python dep mgmt
- quarto, for authoring reports
- hiyapyco, for hierarchical YAML config
- pre-commit, for linting
I'd like to share DSO, a command line helper to build reproducible data science projects with ease.
It is an opinionated way to organize data science projects, built around data version control (DVC).
github.com/Boehringer-I...
We (Chen Zhan!) just launched #sccomp for #Python!
Testing for differences in cell-type proportion in #singlecell #spatial data?
#sccomp is a mixed-effect Bayesian model
- Use sum-constrained BetaBinomial distribution
- Outliers detect.
- Remove unwanted effects
github.com/MangiolaLabo...
(2) Finding the mistake, tracing it back to its origin, and fixing it was only possible because the data and scripts for building the atlas are publicly available and fully reproducible. github.com/icbi-lab/luca
(1) Maintaining a data resource is very much like maintaining software. It is never "done" but constantly improving.
Two years after publication of our single-cell lung cancer atlas, a user found a mistake in the annotation of the EGFR-status of some patients. We fixed the issue and the atlas is now updated on cell-x-gene: cellxgene.cziscience.com/collections/...
What are the takeaways from that? (1/3)