Dan Goldstein (@dggoldst) Bsky

An illusion of predictability in scientific results: Even experts confuse inferential uncertainty and outcome variability | PNAS Traditionally, scientists have placed more emphasis on communicating inferential uncertainty (i.e., the precision of statistical estimates) compare...

This paper finds empirical support for that representation

www.pnas.org/doi/10.1073/...

1 day ago 1 0 1 0

Which chart types performed well / poorly?

1 day ago 1 0 1 0

Nearly 30 screwdrivers

My dad inherited my grandfather's tools. They seem to have accumulated screwdrivers at the rate of 1 per 6 years.

1 week ago 3 0 0 0

Exactly! The underwater data centers of the 2010s weren't conceived because computers needed cooling but because it was a fun idea, like underwater restaurants.

2 weeks ago 0 0 0 0

Beach glass is just pollution with good branding

2 weeks ago 4 0 0 0

I've enjoyed discussing this with you in all honesty. You're a thoughtful person.

2 weeks ago 0 0 0 0

Technology and acceptable practice evolve together. New incentives become necessary over time. The replication crisis happened because people thought it wouldn't be detected and they wouldn't be held accountable. Now there's an incentive to bind yourself via pre-registration.

2 weeks ago 0 0 0 0

We saw this knot on the sidewalk in Little Tokyo, Los Angeles and tied it at home. Carrick bend, apparently.

2 weeks ago 23 1 0 0

Yes the incentive for the individual researcher is that they will not get their paper published or they will experience reputational harm if they publish a p hacked or nongeneralizable result.

2 weeks ago 0 0 1 0

Minecraft OSHA violation

2 weeks ago 3 0 1 0

The incentives need to be on humans not to publish things that are p hacked or not generalizable. Human beings will then use AI to ensure that their name is not attached to shoddy work. We are in charge of AI, we can make it do whatever we want. If it doesn't do what we want we can train and repair.

2 weeks ago 0 0 1 0

There must also be incentives. Penalties for having some things p-hacked or unrobust with your name on it. AI can help you cheat,and AI can help you do a better and more generalizable science. It's about incentives, what we're rewarding, and what the review process is promoting.

2 weeks ago 3 0 0 0

Once things are engineering problems, they see progressive improvement. AI on the reviewing side can test robustness under alternative specifications, evidence of p hacking, etc.

AI can exert effort human reviewers won't, extending their abilities. The rest is incentives.

2 weeks ago 1 0 1 0

I think AI is more likely to tell people they’re misusing stats than to get people to misuse them. People have generally been proceeding with little understanding and the only way is up.

2 weeks ago 2 0 1 0

Sounds like an excellent middle school

2 weeks ago 0 0 0 0

And, more generally, we applied consistent terminology:

* Reproducibility = same data, same analysis
* Robustness = same data, different analyses
* Replicability = same question, different data

These are examples of repeatability, and are components of the broader concept of credibility.

2 weeks ago 30 10 1 0

Also different topics e.g. reproducibility vs replication of results.

2 weeks ago 1 0 1 0

p hacking is more associated w experiments. A fix for p hacking is pre-registration of the experimental methods and analysis.

But if you are only doing analysis of observational data, you're quite unconstrained. There's not even a set of experimental results to constrain what you try.

2 weeks ago 0 0 0 0

I was too lazy to add them but the references are in the last post

2 weeks ago 0 0 0 0

You have a data set and you torture it until it confesses a publishable story.

You literally run loops testing millions of assumptions, exclusions, and model specifications. You end up in a subspace where things look pretty robust but this subspace is just 5% of reasonable analyses.

2 weeks ago 2 1 1 0

You have a data set and you torture it until it confesses a publishable story.

You literally run loops testing millions of assumptions, exclusions, and model specifications. You end up in a subspace where things look pretty robust but this subspace is just 5% of reasonable analyses.

2 weeks ago 2 1 1 0

I think if it's public archival data and they've submitted the code it's easily reproduced. Other fields haven't been held to submitting the data and code.

2 weeks ago 2 0 0 0

It might have to do with the kind of data used? Psych data was stored on local computers and got lost over time. Econ data is often a mix of public and private and the private can be sensitive. But I would imagine poli sci data is generally public and easy to re-find years later.

2 weeks ago 0 0 2 0

Huge meta-research project puts claims in social-science papers to the test Three experts discuss lessons learnt from a large-scale dissection of the reproducibility, analytical robustness and replicability of published results.

References:
www.nature.com/articles/d41...
www.nature.com/articles/s41...
www.nature.com/articles/s41...
www.nature.com/articles/s41...

2 weeks ago 10 1 0 0

My subjective opinion:
- Stereotypes of psych are colored by a few data tampering stories
- Econ uses archival data that is tamperproof
- Econ trouble happens through analysis hacking
- Econometrics is no more legit than Psychometrics
- Cognitive & bio-psych are meticulous

2 weeks ago 20 1 1 0

Rates of successful replications similar in Econ and Psych (though denominators sizes are small)

"Papers are weighted combinations of claims accounting for multiple claims per paper replicated in some cases"

2 weeks ago 5 0 1 0

Reanalyses of experimental data more likely to yield consistent results than reanalyses of observational data.

(Analysis hacking was always a bigger issue than p hacking in my opinion)

2 weeks ago 10 0 2 0

Econ had about the same rate of inferentially robust analyses as Psych.

2 weeks ago 8 0 1 0

Econ had about the same rate of reanalyses reaching the same conclusion as Psych.

2 weeks ago 10 0 1 0

Nature meta-research project puts claims in social-science paper to the test. Refs in last post

I'm interested in Econ and Psych so I focused on that:

Econ had about the same rate of "not reproducible" analyses as Psych and a worse rate then Political Science.

2 weeks ago 68 26 3 18

Posts by Dan Goldstein