My NHB paper is literally about an article where problems with IHS transformations revealed data irregularities that (partly) resulted in that article's retraction. That NHB paper doesn't take any stance on IHS specifications, let alone a *pro* stance. 2/
doi.org/10.1038/s415...
Posts by Jack Fitzgerald
Given my stance on log-like specifications, I was surprised to learn that there's a 'news' article on my paper in Nature Human Behaviour, claiming that it actually advocates for the use of IHS specifications. This is categorically untrue. 1/
t.co/ZGjAmHZhLv
But for full details, nothing will beat reading the paper. Give it a look! 18/
doi.org/10.31222/osf...
For the on-the-go researchers out there, we’ve made teaching slides to make the paper’s findings more digestible. 17/
jack-fitzgerald.github.io/files/Log-Li...
Huge shoutout to the rest of the research team who made this possible: @jopieboy.bsky.social, @fialalenka.bsky.social @essieconomist.bsky.social, and @davidvalenta.bsky.social. 16/
We have a couple of recommendations on how to deal with the logs-with-zeros problem in the paper. But our biggest advice is this: 🛑stop🛑 using log-like specifications. They are actively polluting the literature with spuriously significant results. 15/
Consequently, we find that log-like specifications in our replication sample are statistically significant 40-49% more frequently than in the general causal economics literature, and published test statistics are *really* likely to be just beyond 5% significance thresholds. 14/
You don’t need p-hacking for this to cause problems. If either researchers file-drawer statistically insignificant results, or journals select statistically significant results, the most spuriously significant log-like specifications can be overrepresented in the literature. 13/
This happens because messing with unit scale (or the c in ln(Z+c)) allows you to overfit the data. In sample-split simulation data, the log-like specifications that yield the most spuriously significant results within-sample have the worst out-of-sample predictability. 12/
We show this in simulation evidence: even with a placebo treatment and an outcome made of random noise, you get a >30% increase in rejection rates by mining over unit scalings. We also observe sweet spots in ~21% of our simulation draws. 11/
This creates a multiple hypothesis testing problem. There’s no ‘right/wrong’ scale in which to measure a variable and no ‘right/wrong’ constant c to add to ln(Z+c). So you get an infinite number of tests that are equally theoretically valid, but most give different results. 10/
We also discovered that in ln(Z+c) specifications, you can get sweet spots both in unit scale and in constant c. 9/
We discovered that t-statistics in log-like specifications can be non-monotonic in unit scale, creating local optima in t-statistics that can briefly dip into rejection regions. This doesn’t just matter for point estimates: it affects studies’ entire conclusions. 8/
Two of our robustness checks involved scaling variables up or down by a factor of 1000 before transformation. For 38% of estimates, *both* of these checks shrunk t-statistics. This pointed us to the existence of what we call ‘sweet spots’. 7/
These specifications are *really* non-robust. Just removing the log-like transformation changes 36% of conclusions and significantly sign-flips 12% of estimates. Other checks change conclusions for 14-36% of estimates. 6/
We re-analyzed replication data from 46 papers whose main findings are defended by log-like specifications. Using ceteris paribus robustness checks that change one design choice at a time, we find widespread non-robustness and publication bias in these specifications. 5/
Chen & @jondr44.bsky.social (2024, QJE) show that you can get coefficients of any magnitude you want by adjusting the scale of transformed variables before transformation. (Semi-)elasticities and percentage effects should never have this property. 4/
doi.org/10.1093/qje/...
Many recent papers highlight identification problems that arise because these specifications’ results depend on the unit scale of transformed variables. So e.g., regressions on ln(dollars + 1) will give you different coefficients and t-statistics than ln(cents + 1). 3/
If you have 0s in your data, you can’t run log specifications w/o drops because ln(0) is undefined. So many researchers replace the log transformation with the ln(Z+1) or inverse hyperbolic sine (IHS) transformations, which look like ln(Z) for large Z but are defined at 0. 2/
New preprint! We reanalyze 46 papers that use log-like specifications (ln(Z+1), inverse hyperbolic sine etc). We find widespread non-robustness, and we show through theory + simulation how these models drive spurious significance. 1/
doi.org/10.31222/osf...
We're thrilled to open registration for the Utrecht Replication Games. The event will be at the at the University of Utrecht on June 4th. Psych, public health, pol sci and econ studies will be reproduced!
Register here: www.surveymonkey.ca/r/Replicatio...
When your spam targets won't submit so you *demand submission*
Wishing y'all luck today
Had a wonderful time organizing the scientific side of the CBS Replication Games! Thank you to the replicators for your hard work!
For those without institutional access to NHB, Nature has provided the following link, from which you can access my Matters Arising for free: rdcu.be/eYab2
27/27
I want to thank the editors of @nathumbehav.nature.com for taking this matter seriously and keeping in touch with me over the past 21 months. Their commitment to open science made this correction possible. 26/x
In addition to the lives at stake, governments spend hundreds of billions each year on counterterrorism. To determine what policies best deter terrorism is to answer a trillion-dollar question. Unfortunately, this study categorically cannot answer that question. 25/x
For example: arrest rates are computed as (arrests/attacks), but countries can have more arrests than attacks. In most country-years, arrest rates are either >100% or a positive number divided by 0. Belgium apparently had a terrorism arrest rate of 16,600% in 2018. 24/x
I don’t have space in the 1200-word limit of a Matters Arising to cover everything wrong I found with this paper. The more you look, the more you find. Many ‘minor’ details that would be worth a comment in their own right are relegated to the Supplementary Material. 23/x
Line graphs displaying time series of the raw number of terrorist attacks over time in each of 28 EU member states. A red box highlights the time series for the United Kingdom. Based on WEA's data.
The original paper explicitly highlights how much terrorism ‘declined’ in the UK after the COVID-19 pandemic. But the decline after 2020 is only that stark because all terrorism-related variables are imputed to 0, without disclosure, after the UK left the EU in 2020. 22/x
This means that the paper’s panel dataset on terrorism (enforcement) in the EU included country-years where the country *wasn’t even yet/anymore part of the EU*, assigning all terrorism-related variables to zero for these country-years without telling anyone. 21/x