Recording of my talk on {collapse} and the {fastverse} at the Bank of Portugal‘s workshop „Speeding up Empirical Research: Tools and Techniques for Fast Computing“ in December is now online: www.youtube.com/watch?v=qO5d...
It includes examples from trade and network processing.
#rstats #DataScience
Posts by Sebastian Krantz
Release blog post is here: sebkrantz.github.io/Rblog/2026/0... #Rstats #DataScience
I’m thrilled to introduce flownet (sebkrantz.github.io/flownet/), a new R package for transport modeling, supporting stochastic or deterministic traffic assignment to large networks, and powerful tools for (multimodal) network processing/simplification: sebkrantz.github.io/Rblog/2026/0... #Rstats
fixest is an R package for fast and flexible econometric estimation, providing a comprehensive toolkit for applied researchers. The package particularly excels at fixed-effects estimation, supported by a novel fixed-point acceleration algorithm implemented in C++. This algorithm achieves rapid convergence across a broad class of data contexts and further enables estimation of complex models, including those with varying slopes, in a highly efficient manner. Beyond computational speed, fixest provides a unified syntax for a wide variety of models: ordinary least squares, instrumental variables, generalized linear models, maximum likelihood, and difference-in-differences estimators. An expressive formula interface enables multiple estimations, stepwise regressions, and variable interpolation in a single call, while users can make on-the-fly inference adjustments using a variety of built-in robust standard errors. Finally, fixest provides methods for publication-ready regression tables and coefficient plots. Benchmarks against leading alternatives in R, Python, and Julia demonstrate best-in-class performance, and the paper includes many worked examples illustrating the core functionality.
arXiv📈🤖
Fast and user-friendly econometrics estimations: The R package fixest
By Berg\'e, Butts, McDermott
Happy to receive PR's of course...
Yeah, that is the frontier. Would be nice to see it implemented in R, but that would need to be by a macroeconomics practitioner who is into that stuff (which I am no longer).
I'm excited to share the release and rOpenSci publication of dfms 1.0 (docs.ropensci.org/dfms), a high-performance, feature-rich implementation of Dynamics Factor Models for R, supporting mixed-frequency estimation and news decomposition for nowcasting. See also blog post: sebkrantz.github.io/Rblog/
I've started a new personal blog focused on research, career reflections, and travel experiences. The first post documents my recent 6-week trip through Southern Africa, from Zanzibar to Cape Town by Public Transport. FYI, enjoy!
sebkrantz.github.io/blog/posts/f...
Version 0.3.0 of the {dfms} package for dynamic factor modelling in R just made it to CRAN, adding support for monthly + quarterly mixed frequency estimation. This allows for easy business cycle indicator estimation. More at sebkrantz.github.io/dfms/article... and sebkrantz.github.io/dfms/. #rstats
{collapse} 2.1.0 is out! It introduces a new fslice() function (sebkrantz.github.io/collapse/ref...), a new theory-consistent weighted quantile algorithm (sebkrantz.github.io/collapse/ref...) with excellent properties. And some convenience features such as join requirements: #rstats #DataScience
Feel free to join the ECA webinar if you want to see some crazy continent-scale spatial economic modelling.
📅 Feb 10 | 14:00-15:30 EAT
Join industry experts as we explore the costs, benefits, and solutions for Africa’s infrastructure development. 🌐💡
🔗 Register now: bit.ly/3PWKynU
So I think for the moment I'll keep the format unless a reviwer demands something different. I think it is simply more transparent and this is a technical article. There are many benchmarks involving collapse here (github.com/fastverse/fa...), some of which use visual modes of presentation.
And I do have an overall space constraint with this article, which is at 32 pages now. So the only way would be compressing multiple operations in a plot (like duckdb benchmarks). While this may be nice, it does not make for easy syntax comparison and interpretation either.
Ok, thanks for elaborating. I agree that a plot would be nicer, though not necessarily easier to read. Take for example the grouped median benchmark. dplyr's runtime was 5.62s, collapse was 14.6ms - that's a factor ~400. To present that on a plot, it would have to be logarithmic...
Ok, thanks. And that's interesting. What do you find difficult about them?
The issue I have with plots is that they are more space consuming, and show one kind of information, wheras the tables have at least 3 useful information: Average and median runtime and memory consumption.
The {collapse} (@rcollapse.bsky.social) arXiv paper has just been updated - following extensive revision: arxiv.org/abs/2403.05038. I believe it is a great resource for anyone doing scientific computing with #rstats.
It's nice to see an increasing number of #rstats packages use {collapse}. A developer focused vignette was long planned and now it is here - with modest advice on writing efficient R package code in general and using {collapse} in particular: sebkrantz.github.io/collapse/art...