Python implementation of `rdrobust` that is significantly faster! github.com/leostimpfle/...
These runtimes are in logscale!!
Posts by Alex
been trying to get my claw to teach me Lean - send PRs cuz misery loves company
github.com/apoorvalal/l...
sbi v.0.26.1 is out π. We initially planned this release for January, but then the Grenoble Hackathon and GSoC applications happened. Now we have three new methods, better neural nets, cleaner internals, better docs, and 9 new contributors π€.
Highlights below π§΅
If you work on very large and very sparse fixed effects problems as in worker-firm panels, or doctor-patient relations, we'd love to learn if it works for you / speeds up your regressions!
There is also long-form documentation of the new algorithmic strategy github.com/py-econometr... plus a new vignette in which we try to explain why our initial "vanilla MAP" algorithm struggled so much on "hard" fixed effects problems: pyfixest.org/explanation/...
You can find the repo here: github.com/py-econometr...
You can access the new algo via the `demeaner_backend = "rust-cgs"` option of `pf.feols()`, and we have published it in a standalone rust crate (including Python bindings. R bindings are WIP and we could use some help! Even more with Stata).
Fixed effects are 'sparse' if each observation belongs to only a few groups. In a worker firm panel, this means that most workers only work at a single firm and (almost) never change their employer.
I am very excited that PyFixest 0.50.0 is on PyPi, including a new graph-based solver for demeaning that makes fixed effects estimation in PyFixest significantly faster for "sparse" fixed effects structures.
This looks really good! Would you be open to me adding a pyfixest fitting skill (π)? Also, have you found a good way to make the docs more LLM accessible, i.e. via providing md twins for each html page / providing an llms.txt / do you think it might be valuable? github.com/NVIDIA/sphin...
MacKinnon Nielsen Webb have also suggested a "fast" cluster jackknife - I had implemented it a long time ago in `summclust`. Description of the ago here: s3alfisc.github.io/summclust/ar...
In Kopenhagen schmeckt der 6 Euro Kaffee aber immerhin ...
PyFixest Sprint
We will be doing a 3-day PyFixest sprint in two weeks - are there any features / regression-based methods widely useful to applied researchers you'd like us to explore/ add?
Also shamelessly plugging @mchow.com as I am wondering if this potentially already available with quartodoc?
@jacobtomlinson.dev has been working on sphinx llm - core idea: provide a .md file for each page of your docs. Does this really help LLMs navigate the documentation? And what about an llms.txt? If yes, what are best practices?
github.com/NVIDIA/sphin...
For me it might be "most fixest installs on CI via r2u" π
This is very funny, is it possible to send kudos after burning through a lot of tokens? π
(Ok, FixedEffectsModels.jl does pretty awesome as well. And we still have quite some catchup work to do with pyfixest =) )
If you fit hdfe regressions with complex fixed effects structures, fixest is unquestionably the goat in terms of performance.
Plus, the performance, which you get via a highly optimized MAP algorithm (with local IT acceleration, global smoothing, and a range of internal tricks and heuristics, see here for lrberge.github.io/fixest/refer...) and all the perf improvements that multiple estimation syntax allows.
I could of course go on here... for more ideas in API design, read the paper!
- Then of course you get regression tables out of the box with etable.
- And you can "bring your own vcov's" functionality with easy plug ins.
- We get a new operator for interacting variables with i(var1, var2) (very under appreciated imo!)
- Last, my favorite feature: multiple estimation syntax!
To point out just a few innovations on the API side that imo set the standard for regression packages. You get
- separation of estimation and inference - and you can adjust inference post-estimation "on the fly"
- a lot of handy post-estimation processing: coef, se, iplot, predict, and much more ..
Strong agree. And do read the paper! It illustrates very nicely how much thought and creativity has gone into designing fixest's API and performance optimizations which are really the two things that make fixest as great.
The neat thing is that this idea generalizes to regression and is worked out in the paper by Wong et al listed in the dbreg docs.
You are left with a 2x2 "compressed" data matrix of "sufficient statistics" and "weights", which here is just the number of treated and untreated observations. You can now compute an ATE (as sum_Y(1) / w(1) - sum_Y(0) / w(0)) and also SEs.
Core idea: say you want to compute the impact of an RCT on Y. You only need:
- the sum of Y in test and control sum_Y(D)
- the sum of squares of Y in test and control
- and store the number of treated and non treated units. w(D)
You can get all of it in a simple group by and sum evaluation.
Things are grim. But in more frivolous news...
@jamesbrandecon.bsky.social and I have been chipping away at `dbreg`, a π¦ for running big regression models on database backends. For the right kinds of problems, the speed-ups are near magical.
Website: grantmcdermott.com/dbreg/
#rstats
[1/2]
Cool! I have many question: Is this in Python and do you have a GitHub repo and would you be up to contribute to moderndid? π