Paulius Alaburda (@alaburda) Bsky

Models as Prediction Machines: How to Convert Confusing Coefficients Into Clear Quantities - Julia M. Rohrer, Vincent Arel-Bundock, 2026 Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear model...

Good news everyone 🥳 Our (w @vincentab.bsky.social) primer on models as prediction machines (with the marginaleffects package) is finally officially published!>

journals.sagepub.com/doi/10.1177/...

15 hours ago 269 107 8 6

ggsql: A grammar of graphics for SQL Introducing ggsql, a grammar of graphics for SQL that lets you describe visualizations directly inside SQL queries.

I am excited beyond description to lift the veil on what we have been working on in 2026:

Please meet ggsql! A new extension of the SQL language for creating visualisations using the grammar of graphics. Read all about it in the blog post or visit the website at ggsql.org

2 days ago 381 83 13 9

⚠️ Be careful when using area charts!

➡️ A short but helpful thread with multiple examples:

(1/13)

#dataviz #informationdesign #datavisualization

6 days ago 11 4 1 2

How to Estimate a Correlation, and What It Means for Science | Computational Psychology Introduction In Part 1 of this series, we showed that six different methods for estimating individual means—James-Stein, classical true score estimation, empirical Bayes, ridge regression, and hierarc...

his is such a good write up (and also validates the "just be bayesian" approach [you get all the shrinkage and extensions to differing number of observations per subject for free!])
haines-lab.com/post/how-to-...

1 week ago 23 5 0 2

Keeping the Current Month Selected in your Power BI Slicers YouTube video by ProcureSQL

Getting tired of your #PowerBI reports opening with a prior month selected? I made a video that shows 3 ways to handle that.

I'm a fan of #3.

Check out the video on the ProcureSQL YouTube channel.

🎥 www.youtube.com/watch?v=j-3H...

#PowerBI #MicrosoftFabric

1 week ago 6 4 0 1

Hi #rstats folks, I can't seem to find examples of deploying a {ragnar} knowledge store, what's your preferred approach? Basically, how do you serve the duckdb file to the chat application? I'm considering just using S3 but that would probably cause slow reads?

1 week ago 0 0 0 0

Being a Staff+ Data Scientist in 2026 Brandon Rohrer | became a data scientist in 2013 when the title was young. It was so new that most companies had no idea what a data scientist should be doing, only that they desperately needed one or they would be left behind. Sound familiar? I've tried to survey the job description of data science a couple of times with varying degrees of success, most recently to go with some informal recommendations for creating data science degree programs. Together with a group of colleages we tried to summarize what data scientists do and the data science subtypes of maker, oracle, detective, generalist. But in the face of changing expectations this doesn't feel like enough anymore. It's time for a refresh. A brief and biased history of the Data Scientist role In the beginning... The field of data science was named in 1997, and the discipline has existed by other names for a very long time. After all, people have been answering questions using data for thousands of years. When data science first got huge, organizations expected data scientists to spin straw into gold—to transform unorganized data archives into profit. Big Data, it was believed, held inherent value, which only needed to be coaxed into cash form. This rarely panned out, so the approach evolved into a) data scientists produce "insights" and then b) "insights" generate profit. This also proved elusive in the end.

Hey #datascience people, new blog in which I aim to describe the complexities of being a Staff+ data science (and adjacent) roles. Let me know what I missed.

brandonrohrer.org/ds_roles

1 week ago 33 6 2 3

After months, I was finally able to combine both my hurricane landfall and return period datasets for the U.S. into one massive infographic.

I absolutely love this. What an insane map to look at!

1 week ago 361 80 33 7

Pictogram showing number of bicycles that could not be fully repaired in a repair cafe, showing an average of 18% not fully repaired. Breakdown into age of bicycle shows little difference in how easy it is to repair. Each icon represents 25 bikes.

A very quick pictogram 📊 for this week's #TidyTuesday data about repair cafes 🛠️ where I focused in on how easy it is to repair (old) bicycles 🚲

Code: github.com/nrennie/tidy...

#RStats #DataViz #ggplot2

1 week ago 25 2 1 0

Any recommendations for a (recent-ish) workflow or tutorial for mixed-effects modelling of experience sampling data in R? Looking for something to hand to my master's student.

2 weeks ago 13 4 5 0

Data Visualization with Textiles This site is notes and drafts that may someday become a Handbook of Data Visualization with Textiles.

If you like data, or making things with textile crafts (sewing, knitting, crochet, weaving, etc.), I've finally written something up about data visualization with textiles. More words coming over the next few months! #DHmakes

2 weeks ago 189 79 7 7

Smoothed Line Faux-Pax's (and Other Things UI/UX Can't Teach You About Data Visualisation) - EXPLORATIONS IN DATA STORYTELLING WITH POWER BI → The explanation → Jump to the Point Power BI development is a relatively straight forward process when managed by one individual start to finish. But when the development process is shared among tea...

Profuse apologies for the profuse use of em-dashes.

Prompted by common sightings of smooth lines in the wild.

kerrykolosko.com/smoothed-lin...

3 weeks ago 21 8 4 2

Embracing Bayesian Methods in Clinical Trials This Perspective discusses the importance of the US Food and Drug Administration’s draft guidance on the use of bayesian methods in clinical trials because it underscores its commitment to modernizing...

Celebrating the draft FDA Bayesian guidance document with our perspective in @jama.com. Honored to co-author w/Jack Lee (MD Anderson), Lisa LaVange (past director of Office of Biostatistics FDA CDER&president of ASA),& my Bayesian inspiration Sir David Spiegelhalter jamanetwork.com/journals/jam...

4 weeks ago 53 14 2 1

Part 2 of my shrinkage estimator series is out! Part 1 covered the univariate case, but now we dive into multivariate shrinkage 🤓

We cover Spearman's classic correlation disattenuation formula, multivariate James-Stein estimators, and hierarchical methods too

haines-lab.com/post/how-to-...

3 weeks ago 42 15 2 3

GitHub - eitsupi/dlin: dbt lineage analysis CLI that parses SQL files directly, written in Rust. No dbt compile, no manifest.json. Designed for AI agents and CI pipelines. dbt lineage analysis CLI that parses SQL files directly, written in Rust. No dbt compile, no manifest.json. Designed for AI agents and CI pipelines. - eitsupi/dlin

I built a CLI tool for analyzing dbt project lineage, with reference to best practices for building AI agent-friendly CLIs.

dlin parses SQL directly, so no manifest.json, dbt compile, or Python is required.
Written in Rust, single binary. Outputs JSON, Mermaid, DOT and more.
github.com/eitsupi/dlin

3 weeks ago 6 2 0 0

Nice paper on effective sample size for Kaplan-Meier survival curve estimates: www.tandfonline.com/doi/full/10.... #Statistics #StatsSky

3 weeks ago 17 5 0 0

I’m talking to some undergrads next week and trying to compile a list of the best ways to tap into the data world. #datasky what’s the most updated Bsky starter packs and lists these days? I lost sight once I got my feed to a good spot

3 weeks ago 12 4 5 0

Screenshot of a html report with an embedded table and a separate button to download the data contained in that table.

Cool little package on CRAN for adding a download data button into html files created from your Rmd.

cran.r-project.org/web/packages...

4 weeks ago 21 7 0 0

Line chart with four coloured lines, with add symbols and direct labels for accessibility.

A new, work-in-progress #RStats package which

📊 Tries to automatically choose the best chart type from data types and values
🎨 Uses accessible colours, with added labels and shapes
📈 Has cleaner, more readable default #ggplot2 styling

in just one line

ggauto(df$v1, df$v2, df$v3)

#DataViz

1 month ago 47 9 2 3

I made a tiny tool for quickly sharing small datasets (< ~1000 rows) without uploading any data to a server.

🔗 ziptbl.com

It compresses the data into the link itself, so there’s no account, hosting, or storage layer involved.

Here's Florence Nightingale's famous 📊 data:
ziptbl.com#d=eNpdlE-LGz...

1 month ago 304 95 14 12

Calculating pi from coin flips (without randomness) YouTube video by Stand-up Maths

This video is *bananas*—you can estimate π from a ton of random coin flips bc the expected prop of heads/total after 50%+ of the flips are heads = π/4

Here's an #rstats version I made of Matt's Python code in the video gist.github.com/andrewheiss/...

With 10 million random flips I got 3.131381

1 month ago 64 12 5 0

“This Is Not The Computer For You” · Sam Henri Gold Sam Henri Gold is a product design engineer building playful, useful software.

This is the best product review I’ve read: samhenri.gold/blog/2026031...

It doesn’t matter if you’re interested in the MacBook Neo or not; this is the kind of essay that makes you think about the potential of technology and the joys of exploration.

1 month ago 1841 632 35 146

Many journals require data and code sharing. Yet code is still rarely shared and datasets are often hard to reuse.

Our new paper introduces the @sortee.bsky.social guidelines for data & code quality control in ecology & evolution, developed by 26 experienced data editors.

📄 doi.org/10.24072/pcj...

1 month ago 88 62 1 1

Thank you for citing #tidyplots 🙏

Lalichat Ariyakulkiat et al. Smoke water promotes root meristem activity and cell elongation but inhibits root hair growth under phosphorus deficiency in rice. South African Journal of Botany (2026).

doi.org/10.1016/j.sa...

#rstats #dataviz #phd

1 month ago 6 1 0 0

Omg I'm so using this - I'm pretty sure we debug functions the same way 😅

1 month ago 1 0 0 0

added it to github.com/tjmahr/WrapR...

1 month ago 8 1 2 0

I had (kinda jokingly?) wondered if there was a Skill that helps write Skills... and there is!

From the Anthropic Skills marketplace: github.com/anthropics/s...

1 month ago 27 5 1 0

Using Quarto to Write a Book I’ve spent the last couple of months revising my Data Visualization book for a second edition that, ideally, will appear some time in the next twelve months. As with the first edition, I’ve posted a c...

Using Quarto to write (and typeset) a book.

1 month ago 240 56 7 2

Great for iterating over many similar tables, otherwise ehhh

1 month ago 1 0 0 0

Who needs PowerPoint when you have Quarto and @emilhvitfeldt.bsky.social's extensions?!

I wanted to make an image-heavy presentation. Usually, I'd reach for Keynote.

But Emil shared quarto-revealjs-editable, which let me stay in Quarto land 🥰 Thanks, Emil!

Find it here: github.com/EmilHvitfeld...

1 month ago 97 16 1 1

Posts by Paulius Alaburda