Noah Haber (@whaleactually.com) Bsky

~100 downloads of the reproducible manuscript packages now, and so far no contacts about errors or not being able to get the package to run etc.

Looks like things are working, or at least not obviously horribly broken.

Neat.

1 week ago 1 0 0 0

OSF

Wasn't there for the actual replications for SCORE, but can say a lot of pains were made to try to match the original designs and sample sizes, with repli sample sizes generally larger than the origs.

Free metarxiv paper link below, appendix describes repli process in detail osf.io/preprints/me...

1 week ago 1 0 0 0

Yep, totally agreed, that can be super valuable, as can be critical reproduction approaches.

FWIW, there are some reasons to try to replicate a "poor" design (e.g. separating statistical noise / publication bias from design, particularly if design flaws can be mitigated or bounded), but more niche

1 week ago 2 0 0 0

What do you mean by "replicate" here?

The papers frame "replication" as using same methods as the original, but with new data, so hard to demonstrate flaws in the method when repeating them.

Are you thinking of a more critique oriented approach? Lots of ways to define/frame the word "replicate"

1 week ago 1 0 1 0

Agreed neither takes priority.

For what it's worth, I tend to take the efficiency argument; methods review is very hard, but it's way less resource intensive than replication.

If we want to sort through a literature, the most efficient path involves triage with methods review first.

1 week ago 3 0 0 0

Really hard to form a clear picture about what high/low replicability in the literature actually means as a diagnostic tool, and definitely not good to use it as a target.

Will throw out there that the *reproducibility* paper has a much much clearer interpretation

1 week ago 2 1 0 0

There is no obviously correct benchmark for what that replication rate should be (and be skeptical of anyone who is sure they know what it is).

Heck, with all the time I've spent in this work, I am still truly not sure if/how much we should be bothered by the rates found.

1 week ago 3 2 1 0

NHST-defined replication is problematic for sure, but in an NHST-defined literature it's the comparative framework.

Question now becomes "does the replication rate differ from what people *expect* it would be."

There's definitely a problem if those don't match, but unclear what the problem is.

1 week ago 2 0 1 0

Things get a bit weirder when we're thinking about population/literature level. The forces impacting replicability aren't always the things impacting design.

You could think about it as "setting aside design" or "assuming a best case scenario where everything is designed well"*

* editors note: LOL

1 week ago 1 1 1 0

And as y'all note, for lack of a general deep understanding about what replication means (and doesn't mean), always the very real risk that folks will take + replication as a strong indicator of strength of inference, so replicating a poorly designed study could be worse than doing nothing.

1 week ago 1 1 1 0

When evaluating a given study, TOTALLY agreed that it's generally not a good idea to attempt to replicate if it's poorly designed. Methods review first always; it's wayless resource intensive and triages out a lot of the literature

+ replication (defined hower) alone says little about reliability

1 week ago 3 0 2 0

*COI disclosure: I am a coathor here, mostly doing the analysis design and implementation. I was not at COS when the replication/reproduction work was designed or underway for SCORE.*

Worth exploring different purposes for replication/replicability for individual papers vs the literature.

1 week ago 2 0 2 0

Oh this is VERY good. Always another paper...

1 week ago 1 0 0 0

Profoundly weird to see these charts in the science mainstream, but if you want to remake them it's all in the code

side note: the color palettes for the above linked reproducibility and replicability papers were based on samples from Weezer album cover art. you're welcome.

2 weeks ago 11 3 2 1

More detailed explanation + packaged version in the blog-post-to-be, but short version is that analysis code generates values for placeholders, and the text had demarkated placeholder contained in curly brackets, so both could be worked on independently.

"Knit" code smooshes the two parts together.

2 weeks ago 2 0 0 0

Trackdown is great and a VERY similar idea, but has the big limitation that you cannot simultaneously work on analysis code/data/etc and the manuscript narrative text. Editing has to be iterative one after the other, which was impractical for us.

So we rolled a (slightly hacky) homebrew solution.

2 weeks ago 1 0 1 0

OSF

Been seeing a few people getting hit with the Nature paywall, so fun fact:

Using the analyst code/data packages above, you can computationally reproduce these papers in their entirety, free and open on osf.io.

(they're also free and open on MetaArXiv osf.io/preprints/me...)

2 weeks ago 2 2 1 0

Or bonus: you can computationally reproduce several of the papers in their entirety from "scratch" if you like, also free and open (CC0)

bsky.app/profile/whal...

2 weeks ago 2 1 0 0

OSF

Free and open on metarxiv

osf.io/preprints/me...

As with (I think) all the other papers, e.g.

osf.io/preprints/me...

2 weeks ago 1 0 1 0

update: internal muffled scream

2 weeks ago 3 0 2 0

I hope y'all tear into it and find any lurking issues, questions, ideas for your own work, etc.

Gotta say it's a bit terrifying to have the entire pipeline so fully exposed and accessible, particularly one as complicated and high profile as SCORE.

But more importanty, I hope it's useful.

2 weeks ago 2 0 1 0

Reproducible manuscripts have been around since code existed, but they are still pretty rare and mostly written in scripting langauges (text and everything) like RMarkdown.

For SCORE, the text is written collaboratively via Google Docs, enabling modern collaborative writing with a large team.

2 weeks ago 3 0 2 0

Then there is rigor and accountability. Anyone can trace every stat and figure to *exactly* how it was estimated.

If there's an error or a way you think things should have been done instead, it's super findable.

As a bonus, that incentivizes code readability, since people might use it.

2 weeks ago 3 0 1 0

Reproducible manuscripts are useful for a few reasons:

One of the best reasons is just practicality. The thing most of us do where we have to manually update every stat/figure etc in a manuscript every time something changes in the code or data is super error-prone and time consuming.

2 weeks ago 3 0 1 0

OSF

Code and data (both for the analyses and for the knit itself) are of course available and linked in the papers.

For the paper on reproducibility, see here osf.io/ed8pj/overview

For the paper on replicability, see here
osf.io/g5sny/overview

Code/data are CC0 1.0 Universal, enjoy!

2 weeks ago 1 0 1 0

Had a small part of this that I am pretty proud of:

Full push-button reproducible manuscripts from data to manuscript.

Nearly every stat, figure, number, table, etc is pulled directly from analysis code and data and "knit" with manuscript text written in Google Docs.

Blog post coming soon ...

2 weeks ago 41 11 3 1

To the eight of you out there whose brains are also damaged in this particular way, thank you/I am sorry.

7 months ago 0 0 1 0

"Elapsed time" is to Intention to Treat as "Moving time" is to Per-Protocol.

I will not explain any further.

7 months ago 1 0 1 0

It's giving Borne vibes....

8 months ago 2 0 0 0

A few of the first recruitment attempts to US-based journals have been met with a "times are too uncertain to take on a project like this," as expected

Small potatoes in the scheme of things, but just another example of the astounding destruction of scientific progress happening right now.

8 months ago 4 0 0 0

Posts by Noah Haber