Good news everyone 🥳 Our (w @vincentab.bsky.social) primer on models as prediction machines (with the marginaleffects package) is finally officially published!>
journals.sagepub.com/doi/10.1177/...
Posts by Paulius Alaburda
I am excited beyond description to lift the veil on what we have been working on in 2026:
Please meet ggsql! A new extension of the SQL language for creating visualisations using the grammar of graphics. Read all about it in the blog post or visit the website at ggsql.org
⚠️ Be careful when using area charts!
➡️ A short but helpful thread with multiple examples:
(1/13)
#dataviz #informationdesign #datavisualization
his is such a good write up (and also validates the "just be bayesian" approach [you get all the shrinkage and extensions to differing number of observations per subject for free!])
haines-lab.com/post/how-to-...
Getting tired of your #PowerBI reports opening with a prior month selected? I made a video that shows 3 ways to handle that.
I'm a fan of #3.
Check out the video on the ProcureSQL YouTube channel.
🎥 www.youtube.com/watch?v=j-3H...
#PowerBI #MicrosoftFabric
Hi #rstats folks, I can't seem to find examples of deploying a {ragnar} knowledge store, what's your preferred approach? Basically, how do you serve the duckdb file to the chat application? I'm considering just using S3 but that would probably cause slow reads?
Being a Staff+ Data Scientist in 2026 Brandon Rohrer | became a data scientist in 2013 when the title was young. It was so new that most companies had no idea what a data scientist should be doing, only that they desperately needed one or they would be left behind. Sound familiar? I've tried to survey the job description of data science a couple of times with varying degrees of success, most recently to go with some informal recommendations for creating data science degree programs. Together with a group of colleages we tried to summarize what data scientists do and the data science subtypes of maker, oracle, detective, generalist. But in the face of changing expectations this doesn't feel like enough anymore. It's time for a refresh. A brief and biased history of the Data Scientist role In the beginning... The field of data science was named in 1997, and the discipline has existed by other names for a very long time. After all, people have been answering questions using data for thousands of years. When data science first got huge, organizations expected data scientists to spin straw into gold—to transform unorganized data archives into profit. Big Data, it was believed, held inherent value, which only needed to be coaxed into cash form. This rarely panned out, so the approach evolved into a) data scientists produce "insights" and then b) "insights" generate profit. This also proved elusive in the end.
Hey #datascience people, new blog in which I aim to describe the complexities of being a Staff+ data science (and adjacent) roles. Let me know what I missed.
brandonrohrer.org/ds_roles
After months, I was finally able to combine both my hurricane landfall and return period datasets for the U.S. into one massive infographic.
I absolutely love this. What an insane map to look at!
Pictogram showing number of bicycles that could not be fully repaired in a repair cafe, showing an average of 18% not fully repaired. Breakdown into age of bicycle shows little difference in how easy it is to repair. Each icon represents 25 bikes.
A very quick pictogram 📊 for this week's #TidyTuesday data about repair cafes 🛠️ where I focused in on how easy it is to repair (old) bicycles 🚲
Code: github.com/nrennie/tidy...
#RStats #DataViz #ggplot2
Any recommendations for a (recent-ish) workflow or tutorial for mixed-effects modelling of experience sampling data in R? Looking for something to hand to my master's student.
If you like data, or making things with textile crafts (sewing, knitting, crochet, weaving, etc.), I've finally written something up about data visualization with textiles. More words coming over the next few months! #DHmakes
Profuse apologies for the profuse use of em-dashes.
Prompted by common sightings of smooth lines in the wild.
kerrykolosko.com/smoothed-lin...
Celebrating the draft FDA Bayesian guidance document with our perspective in @jama.com. Honored to co-author w/Jack Lee (MD Anderson), Lisa LaVange (past director of Office of Biostatistics FDA CDER&president of ASA),& my Bayesian inspiration Sir David Spiegelhalter jamanetwork.com/journals/jam...
Part 2 of my shrinkage estimator series is out! Part 1 covered the univariate case, but now we dive into multivariate shrinkage 🤓
We cover Spearman's classic correlation disattenuation formula, multivariate James-Stein estimators, and hierarchical methods too
haines-lab.com/post/how-to-...
I built a CLI tool for analyzing dbt project lineage, with reference to best practices for building AI agent-friendly CLIs.
dlin parses SQL directly, so no manifest.json, dbt compile, or Python is required.
Written in Rust, single binary. Outputs JSON, Mermaid, DOT and more.
github.com/eitsupi/dlin
Nice paper on effective sample size for Kaplan-Meier survival curve estimates: www.tandfonline.com/doi/full/10.... #Statistics #StatsSky
I’m talking to some undergrads next week and trying to compile a list of the best ways to tap into the data world. #datasky what’s the most updated Bsky starter packs and lists these days? I lost sight once I got my feed to a good spot
Screenshot of a html report with an embedded table and a separate button to download the data contained in that table.
Cool little package on CRAN for adding a download data button into html files created from your Rmd.
cran.r-project.org/web/packages...
Line chart with four coloured lines, with add symbols and direct labels for accessibility.
A new, work-in-progress #RStats package which
📊 Tries to automatically choose the best chart type from data types and values
🎨 Uses accessible colours, with added labels and shapes
📈 Has cleaner, more readable default #ggplot2 styling
in just one line
ggauto(df$v1, df$v2, df$v3)
#DataViz
I made a tiny tool for quickly sharing small datasets (< ~1000 rows) without uploading any data to a server.
🔗 ziptbl.com
It compresses the data into the link itself, so there’s no account, hosting, or storage layer involved.
Here's Florence Nightingale's famous 📊 data:
ziptbl.com#d=eNpdlE-LGz...
This video is *bananas*—you can estimate π from a ton of random coin flips bc the expected prop of heads/total after 50%+ of the flips are heads = π/4
Here's an #rstats version I made of Matt's Python code in the video gist.github.com/andrewheiss/...
With 10 million random flips I got 3.131381
This is the best product review I’ve read: samhenri.gold/blog/2026031...
It doesn’t matter if you’re interested in the MacBook Neo or not; this is the kind of essay that makes you think about the potential of technology and the joys of exploration.
Many journals require data and code sharing. Yet code is still rarely shared and datasets are often hard to reuse.
Our new paper introduces the @sortee.bsky.social guidelines for data & code quality control in ecology & evolution, developed by 26 experienced data editors.
📄 doi.org/10.24072/pcj...
Thank you for citing #tidyplots 🙏
Lalichat Ariyakulkiat et al. Smoke water promotes root meristem activity and cell elongation but inhibits root hair growth under phosphorus deficiency in rice. South African Journal of Botany (2026).
doi.org/10.1016/j.sa...
#rstats #dataviz #phd
Omg I'm so using this - I'm pretty sure we debug functions the same way 😅
added it to github.com/tjmahr/WrapR...
I had (kinda jokingly?) wondered if there was a Skill that helps write Skills... and there is!
From the Anthropic Skills marketplace: github.com/anthropics/s...
Great for iterating over many similar tables, otherwise ehhh
Who needs PowerPoint when you have Quarto and @emilhvitfeldt.bsky.social's extensions?!
I wanted to make an image-heavy presentation. Usually, I'd reach for Keynote.
But Emil shared quarto-revealjs-editable, which let me stay in Quarto land 🥰 Thanks, Emil!
Find it here: github.com/EmilHvitfeld...