Stephan Hollander (@stephanhollander) Bsky

What are the worst ethical disasters in NLP history?

(I'm teaching "ethics of NLP" tomorrow and history is good for teaching this topic.)

Most are data breaches/releases (AOL search logs, OKCupid profiles, Finnish therapy records...) but what others?

I'll put some other examples in thread --> 1/n

2 days ago 42 11 8 0

U.S. Supreme Court Records and Briefs: The Arguments That Shaped America, Now Freely Available | Internet Archive Blogs

I'm excited to share that we've made a collection of historic Supreme Court Records and Briefs available via
@archive.org

I've written a blog post where I go into detail about the importance of this collection.

blog.archive.org/2026/04/20/u...

2 days ago 620 241 4 28

Daily stock returns since 1950.

5 days ago 15 4 2 1

We have started the second day of the MPWZ-CEPR Text-as-Data Workshop (already the 11th edition).
Join us here: ethz.zoom.us/j/62143211732

Text-as-data is used across economics now -- from projects on gender norms to mafia networks, and sanctions evasion, to superstar scientists and AI patents 📈📊

1 week ago 2 2 1 0

📕 💻 📈 Are you curious about the latest work on text-as-data in economics and beyond? We have just kicked off the 11th MPWZ-CEPR Text-as-Data Workshop! We will "time-travel" with LLMs, track narratives, and much more! 40 papers, one link: ethz.zoom.us/j/62143211732.
Program: tinyurl.com/yc2zvy7u

1 week ago 3 2 1 0

🧵1/ Our first meta-science paper (with 350+ coauthors) is published today in Nature. It presents one of the largest-ever reproducibility projects in economics & political science.

Here’s what we found 👇

3 weeks ago 166 89 2 21

found something rather baffling when researching my column this week…

I wanted to see if there was any evidence that AI tools were helping economists to make their research more readable. So I analysed the text of NBER working paper abstracts…

1 month ago 29 11 1 2

Wait, what’s the connection there? 🤷‍♂️

1 month ago 1 0 1 0

🗄 history of NLP and the ACL | Are.na

I'm lecturing about the "History of NLP" this week. What should I include? Any favorite anecdotes, images, people, methods? Slides, books, papers, or talks for inspiration or grounding?

I've been maintaining a small collection here: www.are.na/maria-antoni...

1 month ago 80 15 26 2

Threading some stuff about oil & oil markets, just basic but hope it helps:
1/ oil markets are what you call “finely balanced”. Supply is usually very very close to demand/consumption. Demand is hard to shift *quickly* in response to supply hiccups.
So even small supply changes = big price effects

1 month ago 206 87 3 10

A lifetime of collecting: the 70,000-volume home library of Bruno Schröder, a mining engineer. A wonderland of books 🤩 www.rarebookhub.com/articles/3355

1 month ago 5 0 0 0

Published paper proving that #ChatGPT will always make things up.

Not sometimes. Not until the next update. Always. They proved it with math.

Even with perfect data and unlimited computing power, AI models will still confidently tell you things that are completely false.

arxiv.org/abs/2509.04664

1 month ago 94 38 4 2

🏷️ @aleximas.bsky.social

1 month ago 3 1 0 0

NY State Senate Bill 2025-S7263 Imposes liability for damages caused by a chatbot impersonating certain licensed professionals.

NY is proposing to "Impose liability for damages caused by a chatbot impersonating certain licensed professionals." nysenate.gov/legislation/... How does a chatbot trick you into thinking its a doctor? Senators: If you forgot you were conversing with AI, you need a doctor.

1 month ago 3 2 0 0

Ok I'm in a rabbit hole. If you search "how many decisions do we make in a day" the reported number is almost always 35,000, often reported that this is according to "multiple sources". Yet I can't actually find a single source that backs up that number. Anyone know where this number comes from?

1 month ago 22 8 6 0

Spicy take of the day: there are _always_ unmeasured confounders. We just make value judgements over how much they matter with respect to Y
#statsky #rstats

6 months ago 21 2 2 2

[ #GenAI-post warning] Almost every researcher I know is using Claude Code, and talking about the huge productivity gains. Are we actually producing more scientific papers yet? Since its release in May 2025, arXiv submissions are indeed *12%* above what we'd expect. Details in thread:

1 month ago 28 3 4 2

Looks very promising! Thanks for sharing it here

2 months ago 3 1 0 0

Our paper “Inferring fine-grained migration patterns across the United States” is now out in @natcomms.nature.com! We released a new, highly granular migration dataset. 1/9

2 months ago 71 27 2 5

Love the NLP: thoughtful application @economist.com

2 months ago 8 2 0 0

'Taxi' and 'cab' essentially mean the same thing, but 'cab' doesn't come from 'taxicab.'

It comes from ‘cabriolet,’ which was a type of light carriage.

‘Taxi’ comes from ‘taximeter,’ which is the device that calculates the amount of a fare based on the distance traveled.

2 months ago 731 129 13 16

GitHub - reifjulian/strgroup: Match strings based on their Levenshtein edit distance. Match strings based on their Levenshtein edit distance. - reifjulian/strgroup

I've released a new version of strgroup, a Stata command that does fuzzy string matching. No new functionality, but the underlying C code has been optimized: it now uses much less memory and runs about 5 times faster
github.com/reifjulian/s...

2 months ago 28 9 1 1

Here in Europe, I often hear British English–style pronunciations like DAH-ta, STAH-ta, and LAH-tech—quite consistent?

3 months ago 0 0 1 0

How to pronounce "Stata" - Jason Kerwin From the Statalist FAQ (emphasis mine): 4.1 What is the correct way to pronounce ‘Stata’? Stata is an invented word. Some pronounce it with a long a as in day (Stay-ta); some pronounce it with a short...

Classic question! (LaTeX fans know the struggle.) According to this discussion jasonkerwin.com/nonparibus/2... I’m guessing they leave it up in the air?

3 months ago 1 0 1 0

Open Studio with Pop-Up Store:
Friday December 12, 16-19h
Saturday December 13, 11-18h
Join us at:
Studio Christoph Niemann
Schröderstrasse 2
10115 Berlin
shop.christophniemann.com

4 months ago 29 3 1 1

"Captain Gains" on Capitol Hill Shang-Jin Wei & Yifan Zhou WORKING PAPER 34524 DOI 10.3386/w34524 ISSUE DATE November 2025 Using transaction-level data on US congressional stock trades, we find that lawmakers who later ascend to leadership positions perform similarly to matched peers beforehand but outperform them by 47 percentage points annually after ascension. Leaders' superior performance arises through two mechanisms. The political influence channel is reflected in higher returns when their party controls the chamber, sales of stocks preceding regulatory actions, and purchase of stocks whose firms receiving more government contracts and favorable party support on bills. The corporate access channel is reflected in stock trades that predict subsequent corporate news and greater returns on donor-owned or home-state firms.

令 1 1 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 7 8 9 Year Figure 2: Estimated dynamic quasi-difference-in-differences coefficient, di, of equation(3), with vertical dashed lines representing 90 percent confidence intervals. The point estimate of the year in which the lawmaker became a congressional leader (Year 0) is normalized to zero. BHAR over the 250 days following each trade is the dependent variable and calculated using the Fama-French five-factor plus momentum as the benchmark model.

After becoming a congressional leader, a politician’s stock portfolio beats out those of peers by 47 (!!!) percentage points a year through trades timed around bills and firms that later get government contracts

www.nber.org/papers/w34524

via @florianederer.bsky.social

4 months ago 1435 628 32 83

Interesting paper highlight that binning can be misspecified in panel settings - this drives misinterpretation of extreme temperature shocks. #linkoftheday

www.dropbox.com/scl/fi/1ya6z...

4 months ago 71 18 5 4

In light of record submission rates and a large volume of AI-generated slop, SocArXiv recently implemented a policy requiring ORCIDs linked in the OSF profile of submitting authors, and narrowing our focus to social science subjects. Today we are taking two more steps:
/1

4 months ago 286 143 4 23

GitHub - BenjaminGor/Latex_Notes_Tutorial: Latex Book/Note Writing Tutorial Latex Book/Note Writing Tutorial. Contribute to BenjaminGor/Latex_Notes_Tutorial development by creating an account on GitHub.

How to Reproduce this Book Exactly with LaTeX - great resource for writing Latex #linkoftheday
github.com/BenjaminGor/...

5 months ago 23 3 0 0

This is terrifying.

"[AI agents] can... infer a researcher's latent hypotheses and produce data that artificially confirms them."

...

"We can no longer trust that survey responses are coming from real people" -@seanjwestwood.bsky.social

5 months ago 312 121 7 17

Posts by Stephan Hollander