Posts by Dan Goldstein
Which chart types performed well / poorly?
Nearly 30 screwdrivers
My dad inherited my grandfather's tools. They seem to have accumulated screwdrivers at the rate of 1 per 6 years.
Exactly! The underwater data centers of the 2010s weren't conceived because computers needed cooling but because it was a fun idea, like underwater restaurants.
Beach glass is just pollution with good branding
I've enjoyed discussing this with you in all honesty. You're a thoughtful person.
Technology and acceptable practice evolve together. New incentives become necessary over time. The replication crisis happened because people thought it wouldn't be detected and they wouldn't be held accountable. Now there's an incentive to bind yourself via pre-registration.
We saw this knot on the sidewalk in Little Tokyo, Los Angeles and tied it at home. Carrick bend, apparently.
Yes the incentive for the individual researcher is that they will not get their paper published or they will experience reputational harm if they publish a p hacked or nongeneralizable result.
Minecraft OSHA violation
The incentives need to be on humans not to publish things that are p hacked or not generalizable. Human beings will then use AI to ensure that their name is not attached to shoddy work. We are in charge of AI, we can make it do whatever we want. If it doesn't do what we want we can train and repair.
There must also be incentives. Penalties for having some things p-hacked or unrobust with your name on it. AI can help you cheat,and AI can help you do a better and more generalizable science. It's about incentives, what we're rewarding, and what the review process is promoting.
Once things are engineering problems, they see progressive improvement. AI on the reviewing side can test robustness under alternative specifications, evidence of p hacking, etc.
AI can exert effort human reviewers won't, extending their abilities. The rest is incentives.
I think AI is more likely to tell people they’re misusing stats than to get people to misuse them. People have generally been proceeding with little understanding and the only way is up.
Sounds like an excellent middle school
And, more generally, we applied consistent terminology:
* Reproducibility = same data, same analysis
* Robustness = same data, different analyses
* Replicability = same question, different data
These are examples of repeatability, and are components of the broader concept of credibility.
Also different topics e.g. reproducibility vs replication of results.
p hacking is more associated w experiments. A fix for p hacking is pre-registration of the experimental methods and analysis.
But if you are only doing analysis of observational data, you're quite unconstrained. There's not even a set of experimental results to constrain what you try.
I was too lazy to add them but the references are in the last post
You have a data set and you torture it until it confesses a publishable story.
You literally run loops testing millions of assumptions, exclusions, and model specifications. You end up in a subspace where things look pretty robust but this subspace is just 5% of reasonable analyses.
You have a data set and you torture it until it confesses a publishable story.
You literally run loops testing millions of assumptions, exclusions, and model specifications. You end up in a subspace where things look pretty robust but this subspace is just 5% of reasonable analyses.
I think if it's public archival data and they've submitted the code it's easily reproduced. Other fields haven't been held to submitting the data and code.
It might have to do with the kind of data used? Psych data was stored on local computers and got lost over time. Econ data is often a mix of public and private and the private can be sensitive. But I would imagine poli sci data is generally public and easy to re-find years later.
References:
www.nature.com/articles/d41...
www.nature.com/articles/s41...
www.nature.com/articles/s41...
www.nature.com/articles/s41...
My subjective opinion:
- Stereotypes of psych are colored by a few data tampering stories
- Econ uses archival data that is tamperproof
- Econ trouble happens through analysis hacking
- Econometrics is no more legit than Psychometrics
- Cognitive & bio-psych are meticulous
Rates of successful replications similar in Econ and Psych (though denominators sizes are small)
"Papers are weighted combinations of claims accounting for multiple claims per paper replicated in some cases"
Reanalyses of experimental data more likely to yield consistent results than reanalyses of observational data.
(Analysis hacking was always a bigger issue than p hacking in my opinion)
Econ had about the same rate of inferentially robust analyses as Psych.
Econ had about the same rate of reanalyses reaching the same conclusion as Psych.
Nature meta-research project puts claims in social-science paper to the test. Refs in last post
I'm interested in Econ and Psych so I focused on that:
Econ had about the same rate of "not reproducible" analyses as Psych and a worse rate then Political Science.