Given how machine learning practices have evolved in the last decade, I was very curious if these findings would hold up. Many thanks to Florian for his nuanced and thorough review.
Romantic desire: not predictable in 2017, not predictable in 2026.
Posts by ERROR
Note: An earlier version of this thread described the problems with the ML techniques as severe, but this was misleading. Only without the described countermeasures, these might have resulted in severe problems.
The full rationale, including the review and author response, are available on PsyArXiv: osf.io/hxnum
The study materials and all new materials generated as part of the review are on OSF osf.io/y3hxg/ and Github github.com/FlorianParge....
Original article available here: doi.org/10.1177/0956...
This shows how replications can function as a safety net not just for random fluctuation, but also for problems with methods that were not known at the time.
Best practices evolve and especially in machine learning, avoiding overfitting can be difficult. Simple, easy to explain techniques like splitting the sample and straightforward replications are easy by comparison and will remain a best practice.
By doing training/test set splits and conducting a straightforward validation of their results in a second sample, the authors reined in the bias.
However, he also found that the machine learning techniques used, state-of-the-art at the time, could lead to inflated performance estimates. This could have led to bias had the authors not conducted self-replication.
Florian found some minor transcription errors and improvable documentation practices (unsurprising given the age of the work and the studies involved).
We are grateful to Florian Pargent for his in-depth review and reanalysis and to @datingdecisions.bsky.social @pauleastwick.bsky.social @elijfinkel.bsky.social for their willingness to have their impactful work scrutinised for errors.
New report: Joel, Eastwick, & Finkel (2017) “Is Romantic Desire Predictable? Machine Learning Applied to Initial Romantic Attraction”. Based on the review by @florianpargent.bsky.social, we find Minor Errors that do not affect the core conclusions of the manuscript. osf.io/hxnum
This is a follow-up on bsky.app/profile/erro...
News from scientific self-correction: Authors pushing to get errors in their papers corrected. Lukas continues to be a role model for how scientists should handle post-publication peer review.
Improving scientific practice can seem daunting. In this fantastic talk (and thread below), Julia Rohrer shares practical ways to communicate methodological insights to a wider audience of researchers.
At ERROR, we cannot compete with million-dollar bounties for whistleblowers. But it is great to see sleuthing work rewarded, and institutions admitting when their researchers engaged in misconduct.
Post-publication peer review is at it best when it's thoughtful, scrupulous, steeped in detail – and challenges key claims of the paper. @janhove.bsky.social's discussion of a recent paper on multilingualism exemplifies this.
Metascientists step up as role models for a healthy error culture in science. Here is a great case where an author and a critical reader collaborated to set the record straight.
Voluntary retraction remains a key way to put scientific self-correction into practice. Zhu and Holmes (2024) did the right thing when they realized that some of their results were based on a coding error.
Original: psycnet.apa.org/fulltext/202...
With retraction: psycnet.apa.org/fulltext/202...
Many error remain to be found in clinical trials. Patients deserve reliable results. Kudos to these authors for their persistent work to correct the record.
Congratulations to @simine.com for winning the Einstein Foundation Individual Award! 🎉
A well-deserved recognition for her seminal efforts to improve scientific rigor, which includes instituting detailed checks for errors and computational reproducibility at Psychological Science.
I think this is an overly pessimistic take from the @bmj.com.
Sharing data does not inherently increase trust, rather it enables verification which allows for trust calibration.
This example is a win. Serious issues were rapidly detected that would not have been without mandatory data sharing.
Synchronous Robustness Reports could explore implications of different analytical choices – but they could still suffer from bias. Hardwicke argues that preregistration is crucial to prevent it.
@tomhardwicke.bsky.social
Are methodological and causal inference errors creating a false impression that the gut microbiome causes autism? In this strong analysis, Mitchell, Dahly, and Bishop question the evidence.
They show that triangulation in science requires multiple robust lines of research.
New Nature podcast episode about ERROR and the Perspectives on Scientific Error workshop!
“We pay experts to examine important and influential scientific publications for errors ... We expect most published research to contain some errors ... our reward system pays bonuses to both authors and reviewers even when minor errors are found ..."
statmodeling.stat.columbia.edu/2025/07/13/e...
✨ ERROR (@error.reviews) is a bug-bounty program for science that seeks to estimate the prevalence and nature of errors. error.reviews
EU legislation requiring clothes be reused and recycled may be based on a numerical error in a 2017 NGO report where $460 billion was added instead of subtracted.
www.frontiersin.org/journals/sus...