Leveraging 30,000 stock predictors and AI to generate 288 finance papers with custom hypotheses, highlighting risks of automated research and HARKing (Hypothesizing After Results are Known) in academia, from Robert Novy-Marx and @VelikovMihail https://www.nber.org/papers/w33363
Posts by Mihail Velikov
Agree completely! One point was to show how close you get with simple prompts. We are working on quantifying and fixing the hallucinations though that will take more work. But even with the current state of agentic AI a lot of the remaining issues I think are fixable, let alone with what comes next.
Thank you, @amanela.bsky.social! We did consider that, but decided against it due to the ethical considerations and the strain it would have brought on editors and referees. I'm pretty sure they could be published somewhere. I was super curious though how high up the ladder they would have made it.
Very excited to release a major revision to our paper on algorithmic collusion by large language models.
#EconSky
Researchers used AI to generate 288 complete academic finance papers predicting stock returns, complete with plausible theoretical frameworks & citations. Each paper looks and reads as legit.
They did this to show how easy it now is to mass produce "credible" research. Academia isn't ready.
Thanks for featuring our work, Ethan!
Really cool work that raises questions about how we think about science and progress. The push towards pre registration has benefits but foregoing HARKing has many costs too. The optimal balance is not obvious!
Cool results that raise interesting questions about a swath of asset pricing papers. Also really like the assaying framework they use.
In the paper we raise further questions about research integrity and evaluation that reflect the realities of AI-enabled research production and give some initial thoughts on ways to address those.
Key implication: When AI can rapidly produce plausible hypotheses for any empirical finding at unprecedented scale, how do we ensure quality control in academic research?
Another version for the OLM signal invokes production-based asset pricing arguments and cites Cochrane (1992) and Zhang (2005). While the stories are not always flawless, they are remarkably coherent, especially considering the scale at which we can produce them.
For example, one of the signals is the ratio of current assets to EBITDA. The LLM creatively names the signal "Operating Liquidity Margin". One version hypothesizes that OLM predicts returns due to slow diffusion of information and cites Hirshleifer and Teoh's (2003) limited attention model.
The papers are remarkably coherent - they include creative names for the signals, contain custom introductions providing different hypotheses for the observed predictability patterns, and incorporate citations to existing (and, on occasion, imagined) literature.
To assess this question we:
1⃣Mined 30K+ potential stock return predictors
2⃣Validated 96 robust signals using our "Assaying Anomalies" protocol
3⃣Used LLMs to generate 3 versions of complete papers for different hypotheses for each signal
Papers & code are available at:
github.com/velikov-miha...
An academic paper has excellent empirical evidence & hypotheses that perfectly match the patterns in the data.
One catch: AI wrote the hypotheses after seeing the results.
Should this matter?
New paper w/ Robert Novy-Marx on AI-Powered (Finance) Scholarship🧵
papers.ssrn.com/sol3/papers....