What are the worst ethical disasters in NLP history?
(I'm teaching "ethics of NLP" tomorrow and history is good for teaching this topic.)
Most are data breaches/releases (AOL search logs, OKCupid profiles, Finnish therapy records...) but what others?
I'll put some other examples in thread --> 1/n
Posts by Stephan Hollander
I'm excited to share that we've made a collection of historic Supreme Court Records and Briefs available via
@archive.org
I've written a blog post where I go into detail about the importance of this collection.
blog.archive.org/2026/04/20/u...
Daily stock returns since 1950.
We have started the second day of the MPWZ-CEPR Text-as-Data Workshop (already the 11th edition).
Join us here: ethz.zoom.us/j/62143211732
Text-as-data is used across economics now -- from projects on gender norms to mafia networks, and sanctions evasion, to superstar scientists and AI patents 📈📊
📕 💻 📈 Are you curious about the latest work on text-as-data in economics and beyond? We have just kicked off the 11th MPWZ-CEPR Text-as-Data Workshop! We will "time-travel" with LLMs, track narratives, and much more! 40 papers, one link: ethz.zoom.us/j/62143211732.
Program: tinyurl.com/yc2zvy7u
🧵1/ Our first meta-science paper (with 350+ coauthors) is published today in Nature. It presents one of the largest-ever reproducibility projects in economics & political science.
Here’s what we found 👇
found something rather baffling when researching my column this week…
I wanted to see if there was any evidence that AI tools were helping economists to make their research more readable. So I analysed the text of NBER working paper abstracts…
Wait, what’s the connection there? 🤷♂️
I'm lecturing about the "History of NLP" this week. What should I include? Any favorite anecdotes, images, people, methods? Slides, books, papers, or talks for inspiration or grounding?
I've been maintaining a small collection here: www.are.na/maria-antoni...
Threading some stuff about oil & oil markets, just basic but hope it helps:
1/ oil markets are what you call “finely balanced”. Supply is usually very very close to demand/consumption. Demand is hard to shift *quickly* in response to supply hiccups.
So even small supply changes = big price effects
A lifetime of collecting: the 70,000-volume home library of Bruno Schröder, a mining engineer. A wonderland of books 🤩 www.rarebookhub.com/articles/3355
Published paper proving that #ChatGPT will always make things up.
Not sometimes. Not until the next update. Always. They proved it with math.
Even with perfect data and unlimited computing power, AI models will still confidently tell you things that are completely false.
arxiv.org/abs/2509.04664
🏷️ @aleximas.bsky.social
NY is proposing to "Impose liability for damages caused by a chatbot impersonating certain licensed professionals." nysenate.gov/legislation/... How does a chatbot trick you into thinking its a doctor? Senators: If you forgot you were conversing with AI, you need a doctor.
Ok I'm in a rabbit hole. If you search "how many decisions do we make in a day" the reported number is almost always 35,000, often reported that this is according to "multiple sources". Yet I can't actually find a single source that backs up that number. Anyone know where this number comes from?
Spicy take of the day: there are _always_ unmeasured confounders. We just make value judgements over how much they matter with respect to Y
#statsky #rstats
[ #GenAI-post warning] Almost every researcher I know is using Claude Code, and talking about the huge productivity gains. Are we actually producing more scientific papers yet? Since its release in May 2025, arXiv submissions are indeed *12%* above what we'd expect. Details in thread:
Looks very promising! Thanks for sharing it here
Our paper “Inferring fine-grained migration patterns across the United States” is now out in @natcomms.nature.com! We released a new, highly granular migration dataset. 1/9
Love the NLP: thoughtful application @economist.com
'Taxi' and 'cab' essentially mean the same thing, but 'cab' doesn't come from 'taxicab.'
It comes from ‘cabriolet,’ which was a type of light carriage.
‘Taxi’ comes from ‘taximeter,’ which is the device that calculates the amount of a fare based on the distance traveled.
I've released a new version of strgroup, a Stata command that does fuzzy string matching. No new functionality, but the underlying C code has been optimized: it now uses much less memory and runs about 5 times faster
github.com/reifjulian/s...
Here in Europe, I often hear British English–style pronunciations like DAH-ta, STAH-ta, and LAH-tech—quite consistent?
Classic question! (LaTeX fans know the struggle.) According to this discussion jasonkerwin.com/nonparibus/2... I’m guessing they leave it up in the air?
Open Studio with Pop-Up Store:
Friday December 12, 16-19h
Saturday December 13, 11-18h
Join us at:
Studio Christoph Niemann
Schröderstrasse 2
10115 Berlin
shop.christophniemann.com
"Captain Gains" on Capitol Hill Shang-Jin Wei & Yifan Zhou WORKING PAPER 34524 DOI 10.3386/w34524 ISSUE DATE November 2025 Using transaction-level data on US congressional stock trades, we find that lawmakers who later ascend to leadership positions perform similarly to matched peers beforehand but outperform them by 47 percentage points annually after ascension. Leaders' superior performance arises through two mechanisms. The political influence channel is reflected in higher returns when their party controls the chamber, sales of stocks preceding regulatory actions, and purchase of stocks whose firms receiving more government contracts and favorable party support on bills. The corporate access channel is reflected in stock trades that predict subsequent corporate news and greater returns on donor-owned or home-state firms.
令 1 1 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 7 8 9 Year Figure 2: Estimated dynamic quasi-difference-in-differences coefficient, di, of equation(3), with vertical dashed lines representing 90 percent confidence intervals. The point estimate of the year in which the lawmaker became a congressional leader (Year 0) is normalized to zero. BHAR over the 250 days following each trade is the dependent variable and calculated using the Fama-French five-factor plus momentum as the benchmark model.
After becoming a congressional leader, a politician’s stock portfolio beats out those of peers by 47 (!!!) percentage points a year through trades timed around bills and firms that later get government contracts
www.nber.org/papers/w34524
via @florianederer.bsky.social
Interesting paper highlight that binning can be misspecified in panel settings - this drives misinterpretation of extreme temperature shocks. #linkoftheday
www.dropbox.com/scl/fi/1ya6z...
In light of record submission rates and a large volume of AI-generated slop, SocArXiv recently implemented a policy requiring ORCIDs linked in the OSF profile of submitting authors, and narrowing our focus to social science subjects. Today we are taking two more steps:
/1
How to Reproduce this Book Exactly with LaTeX - great resource for writing Latex #linkoftheday
github.com/BenjaminGor/...
This is terrifying.
"[AI agents] can... infer a researcher's latent hypotheses and produce data that artificially confirms them."
...
"We can no longer trust that survey responses are coming from real people" -@seanjwestwood.bsky.social