Cristian Pattaro (@cpattaro) Bsky

I've recently made some changes to the gallery of all my data visualisations 📊

Most examples now have links to the underlying source code! If you see something you like, you can see how it was made 🤩

#DataViz #RStats

1 week ago 66 14 1 0

14 things our PhD supervisors got right and why it mattered PhD students reflect on how their supervisors made a meaningful difference — from quiet acts of kindness to career-shaping guidance.

14 things our PhD supervisors got right and why it mattered www.nature.com/articles/d41...

1 week ago 0 0 0 0

Our review "Integrating genetic data with biological insight: A practical guide to cis-Mendelian randomization" is now published at @ajhgnews.bsky.social - led by @vkarhune.bsky.social and Benji Woolf with critical insight from Dipender Gill and Pallav Bhatnagar. Thread follows:

1 week ago 10 6 1 0

Cis-MR studies are not intrinsically superior to genome-wide MR studies, and algorithmically-performed cis-MR analyses will rarely be optimal. But when performed with care, cis-MR is a powerful tool to inform about putative causal effects.

1 week ago 2 3 1 0

Caveat: it remains clear that, a hypothesis generating exploration cannot estimate a sample size in the traditional way, as there is no hypothesis to base the estimation on (it is exploratory, in fact; and we would be talking about hypothesis-testing, not prediction).
11/11

1 week ago 0 0 0 0

Lesson taken: when working in applied contexts, statisticians should advocate for robustness and lead choice of sound methods, even in apparently blurry data situations, where #AI tools are not going to make any magic.
10/n

1 week ago 0 0 1 0

pmsampsize: Sample Size for Development of a Prediction Model Computes the minimum sample size required for the development of a new multivariable prediction model using the criteria proposed by Riley et al. (2018) <<a href="https://doi.org/10.1002%2Fsim.7992" target="_top">doi:10.1002/sim.7992</a>>. pmsampsize can be used to calculate the minimum sample size for the development of models with continuous, binary or survival (time-to-event) outcomes. Riley et al. (2018) <<a href="https://doi.org/10.1002%2Fsim.7992" target="_top">doi:10.1002/sim.7992</a>> lay out a series of criteria the sample size should meet. These aim to minimise the overfitting and to ensure precise estimation of key parameters in the prediction model.

also check
cran.r-project.org/web/packages...

and feel free to add more resources down here

1 week ago 0 0 1 0

! the good news 😀

However, methods and [importantly] software do exist to enable estimation of the minimum required sample size to enable reliable estimates and prevent overfitting 8/n

1 week ago 0 0 1 0

...and spreads over all #AI methods
pubmed.ncbi.nlm.nih.gov/40461350/
7/n

1 week ago 0 0 1 0

The issue of insufficient sample size is common to other fields www.nature.com/articles/s41... 6/n

1 week ago 0 0 1 0

A recent systematic review www.sciencedirect.com/science/arti... @jclinepi.bsky.social highlights that most cancer research studies don't check if N is large enough to
1-guarantee reliable prediction
2-prevent overfitting
when research is conducted w insufficient N👉reproducibility remains a mirage🏝️

1 week ago 0 0 1 0

Among non-statisticians it is also commonly believed that #ML outperforms traditional prediction models particularly when N is small (if anything, the opposite is true). 4/n

1 week ago 0 0 1 0

However, many advocate widespread use of #ML tools as magic solutions to deal with small datasets with large number of predictors. This is especially common among non-statisticians and grant applications are normally flooded by cloudy methods promising exaggerated results. 3/n

1 week ago 0 0 1 0

Most agree that for too long #statistics forgot the data in favor of the models. Time and tools have come that can put data at the centre (big data, complex data, any data) 2/n

projecteuclid.org/journals/sta...

1 week ago 1 0 1 0

On machine learning (#ML) & sample size

Inspired by recent posts, we ran an interesting discussion club within our group of Biostats&Epi

Bottom line: talking about sample size estimation in #ML is taboo in many fields. It shouldn't be & there're many reasons for it

#stats #biostats
1/n

1 week ago 0 0 1 0

🚨 New preprint:
www.biorxiv.org/content/10.6...

We studied the dynamics of maternal gene expression over the course of healthy pregnancy based on weekly samples 👇

1 week ago 5 2 1 0

For our April journal club, @ozvanbocher.bsky.social will present on "Making the most of whole-genome sequencing data for rare variant association tests." This is another talk you won't want to miss!

📅 Friday, April 10
⏰ 8 am (PDT), 11 am (EDT), 5 pm (CEST)
🔗 iges.memberclicks.net/assets/IGES_...

1 week ago 1 1 0 0

ERA’s ABCDE framework for kidney disease prevention: turning the WHO kidney health resolution into action ABSTRACT. In 2025, the World Health Assembly of the World Health Organization (WHO) adopted a resolution on reducing the burden of noncommunicable diseases

#CKD : from being the elephant in the room to being recognized as health priority

academic.oup.com/ndt/article/...

1 week ago 0 0 0 0

In the Interim... | A Podcast by Berry Consultants A podcast on statistical science and clinical trials.

I deeply believe causal thinking is core to good DS regardless if you do analytics, ML, etc

A new-to-me resource is the excellent In the Interim podcast on clinical trial design

Comes out weekly and takes the sting out of my Monday commute

Check it out! www.berryconsultants.com/resources/po...

2 weeks ago 19 1 1 0

Some slides from a recent talk on missing heritability.

www.dropbox.com/scl/fi/kvogj...

2 weeks ago 61 20 3 2

Inclusion bias affects common variant discovery and replication in a health-system linked biobank We quantify inclusion bias in a health-system-linked biobank using classification models to distinguish enrolled individuals from the background population. To evaluate its impact on genetic findings ...

Inclusion bias in #GWAS of #EHR traits

"By weighting the sample using inverse probability weights derived from probabilities of enrollment, we replicate 54% more known GWAS variants" 😱

#statgen

www.cell.com/ajhg/fulltex...

2 weeks ago 7 5 0 2

Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.

#statistics is based on the data and on the models that might have generated those data. For so long, many have ignored the former, going on with the latter only.

Re-upping Leo Breiman's 2001 powerful piece on the two cultures

projecteuclid.org/journals/sta...

2 weeks ago 5 0 0 0

So, kids are seen as obstacles to career. More for women. But men are not safe either. And if it is so in academia, is it worse for other jobs?
Much to fix in our systems.

3 weeks ago 0 1 0 0

Venice’s hidden islands are being resurrected – digitally.

ERC grantee Ludovica Galeazzo at University of Padua uses 3D scans and underwater robots to map 500 years of lost history, showing us a way to preserve heritage worldwide.

🔗 t.co/u6K5cQalCO

#FrontierResearch

3 weeks ago 13 2 0 0

KidneyGenAfrica multi-cohort Genome-wide association study and polygenic prediction of kidney function in 110,000 Africans - Nature Communications Here, the authors conduct a GWAS for eGFR, then a three-stage regional meta-analysis using GWAS summary data from the Eastern, Western, and Southern African geographical regions. Followed by fine mapp...

Delighted to see our new paper published @nature communications.

Largest gwas of kidney function in Africa.

Lower frequency of APOL1 high risk variants in continental African populations

www.nature.com/articles/s41...

3 weeks ago 2 2 1 0

#nature #genomics #precisionmedicine #healthequity #gwas #kidneydisease #kidneygenafrica #africangenomics #naturecommunications #datascience #globalhealth | Segun Fatumo I’m excited to share our latest work, published in #Nature Communications This study, delivered through the KidneyGenAfrica Consortium, represents the largest genome-wide association study (GWAS) of ...

KidneyGenAfrica rocks

👏 @sfatumo.bsky.social & team!

#ckd #africa #gwas

www.linkedin.com/posts/segun-...

3 weeks ago 3 1 0 0

I’m honored and excited to join the Board of Directors! IGES is home to such a vibrant and welcoming scientific community, I look forward to helping it continue to thrive! 💟🧬

Join us for IGES 2026 in beautiful Estérel, QC 🍁 Abstract submission is open until May 30
www.geneticepi.org/2026-annual-...

1 month ago 11 4 0 0

This is an amazing repository of datasets that are helpful to self educate on key #stats principles

1 month ago 4 4 0 0

G-EE

A very much needed (and brilliant) reflection for Mendelian randomization

1 month ago 3 3 0 1

“SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available.“
#stats

1 month ago 4 1 0 0

Posts by Cristian Pattaro