Advertisement · 728 × 90

Posts by Ben Tappin

Post image

How do you align AI in a world of plural, conflicting, and evolving human values?

A starting point is human society itself.

@sydneylevine.bsky.social and I are hiring a postdoc at NYU to combine insights from cultural evolution, computational moral cognition, and AI safety.

Please share widely!1/

1 week ago 6 4 1 1

It’s a little known fact that DAG stands for Dynamically Adjusted Gaslighting

2 weeks ago 6 1 0 0
Post image

Giving vibes

3 weeks ago 2 0 0 0

I wonder how these rates of sycophancy (and their effects) compare against realistic counterfactuals like talking with one’s close friends or spouse.

3 weeks ago 3 0 0 0
Associate Professorship of Causal and Experimental Methods in Politics and Social Policy University salary from £58,265 - £77,645 per annum which is inclusive of an Oxford University Weighting of £1,730 p.aPermanent upon completion of a successful review. The review is conducted during th...

📣 New job at Oxford's Centre for Advanced Social Science Methods (CASSM)! 📣

The Departments of Politics & IR (DPIR) and Social Policy and Intervention (DSPI) are hiring an Associate Professor of Causal and Experimental Methods.

Come work with me and amazing Oxford peeps! Deadline NOON April 27th.

4 weeks ago 46 56 1 1

Students skipping this lecture Not At Random

4 weeks ago 1 0 1 0

The synth backing track takes the vibe to an unexpected place 🥲

4 weeks ago 0 0 0 0
Advertisement
Terminator (missing data) stalking a scared child (researchers).

Terminator (missing data) stalking a scared child (researchers).

Closing out the teaching semester this week with one of my favourite topics. Sadly its pedagogy was forever and unforgivably mar'd by the truly worst naming convention of all time...

1 month ago 27 0 2 2
"The book of y tho" by Judea Pearl

"The book of y tho" by Judea Pearl

1 month ago 34 5 1 0

I am impressed by your ability to successfully operationalize an integer scale for your feelings even though there is no true scale!

1 month ago 1 0 0 0
Preview
The scientific value of numerical measures of human feelings | PNAS Human feelings measured in integers (my happiness is an 8 out of 10, my pain 2 out of 6) have no objective scientific basis. They are “made-up” num...

Ahh sorry I understand! (Clearly my powers of understanding leave a lot to be desired.) I agree that if true this would be substantively interesting in addition to being predictively useful. Your suggestion reminds me of this paper, an instant classic for various reasons www.pnas.org/doi/full/10....

1 month ago 1 0 1 0

I didn’t understand your comment at first—until I realised maybe you’re interpreting CV as curriculum vitae; I meant cross validation! 😅

1 month ago 2 0 1 0

Yes. Then if you push on this the claim becomes “okay it may not cause but it’s still useful because it predicts”. But then it’s like if predictive accuracy was your goal the design and analysis should be different e.g., CV + more predictors. I’ve been fully Westfall & Yarkoni-pilled on this point.

1 month ago 18 1 2 1
Preview
London's Divide Was Called Character. It Was Actually Policy. I built a machine learning model to find London's divide and you can enter your postcode to see which side you're on. We've been blaming the wrong people for it.

Sydney named its socio-economic divide the "latte line" and has been arguing about who drew it for 20 years. London has the same divide and calls it "character." I built a machine learning model to do the impolite thing: draw it and blame someone.
open.substack.com/pub/laurenle...

1 month ago 56 17 2 9
Advertisement

someone at the pentagon frantically typing “Claude, open the strait of Hormuz for me, quickest possible strategy, make no mistakes.”

1 month ago 7988 1216 161 64

Your Claude is getting more interesting chat than mine (sorry Ben’s Claude!).

“Claude, we’re going to refactor <function name here>.R to speed it up. Please analyze bottlenecks and make a plan.” x147

1 month ago 1 0 0 0

The character of Per’s posts increasingly remind me of when an LLM talks to itself (not derogatory)

1 month ago 1 0 1 0
Preview
Clinical trial reforms that once seemed radical How randomized controlled trials, preregistration, and results reporting became standard practice.

The idea of the need for a control group may have been radical at the time, but trust me it's now generally accepted. This is not some fringe believe by methodological hardliners; it's established practice.

www.clinicaltrialsabundance.blog/p/clinical-t...

1 month ago 19 3 1 0
Post image

New newspaper headline for your Intro to Causal Inference lecture just dropped

1 month ago 166 33 4 7
“I don’t like tibbles”, “pipes are overrated”, “loops are great actually”, … what’s next?

Ah yes — the R contrarian character arc 😄
If we continue the escalation, the next stages usually look like:
	•	“Actually, base R plotting is more transparent.”
	•	“data.table was right all along.”
	•	“I write my own S3 methods for fun.”
	•	“Vectorization is just premature optimization avoidance.”
	•	“apply() is perfectly readable.”
	•	“Why depend on 40 packages for a left join?”
	•	“I don’t trust NSE.”
	•	“Magrittr changed evaluation semantics and nobody talks about it.”
	•	“One well-written for-loop beats clever code.”
	•	“I opened the tidyverse source code once and never recovered.”

“I don’t like tibbles”, “pipes are overrated”, “loops are great actually”, … what’s next? Ah yes — the R contrarian character arc 😄 If we continue the escalation, the next stages usually look like: • “Actually, base R plotting is more transparent.” • “data.table was right all along.” • “I write my own S3 methods for fun.” • “Vectorization is just premature optimization avoidance.” • “apply() is perfectly readable.” • “Why depend on 40 packages for a left join?” • “I don’t trust NSE.” • “Magrittr changed evaluation semantics and nobody talks about it.” • “One well-written for-loop beats clever code.” • “I opened the tidyverse source code once and never recovered.”

Planning my next conversation starter

1 month ago 102 10 21 7

I too was very glad to see this! But I feel like the whole episode bodes badly for the future. It’s not sustainable to rely on the CEO of a private company to act against their financial self-interest in order to curtail high-risk AI deployment (here mass surveillance and fully autonomous weapons).

1 month ago 3 0 1 0
Post image

🔔 “How real is the LLM threat to online research in academia?” will be live today.

Experts from Microsoft Research, MIT / Stanford, Max Planck Institute, and Prolific discuss the threat of agentic AI to online research, and how to protect against it.

Link to join live below. #AcademicSky #Research

1 month ago 7 3 1 0
Advertisement

Reposting for visibility. Many researchers still appear oblivious to this fact, which is terrifying! It should be included in every experiment design 101.

1 month ago 7 0 0 0

When I pitch academics on my paper on nulls one common and understandable reaction is "but they're probably noisy and thus uninformative nulls." This is true, but it misses the key realization that WE PUBLISH THE RESULT WHEN THE NOISY TEST IS P<0.05.

2 months ago 36 4 1 2
It must be very hard to publish null results
Publication practices in the social sciences act as a filter that favors statistically significant results over null findings. While the problem of selection on significance (SoS) is well-known in theory, it has been difficult to measure its scope empirically, and it has been challenging to determine how selection varies across contexts. In this article, we use large language models to extract granular and validated data on about 100,000 articles published in over 150 political science journals from 2010 to 2024. We show that fewer than 2% of articles that rely on statistical methods report null-only findings in their abstracts, while over 90% of papers highlight significant results. To put these findings in perspective, we develop and calibrate a simple model of publication bias. Across a range of plausible assumptions, we find that statistically significant results are estimated to be one to two orders of magnitude more likely to enter the published record than null results. Leveraging metadata extracted from individual articles, we show that the pattern of strong SoS holds across subfields, journals, methods, and time periods. However, a few factors such as pre-registration and randomized experiments correlate with greater acceptance of null results. We conclude by discussing implications for the field and the potential of our new dataset for investigating other questions about political science.

It must be very hard to publish null results Publication practices in the social sciences act as a filter that favors statistically significant results over null findings. While the problem of selection on significance (SoS) is well-known in theory, it has been difficult to measure its scope empirically, and it has been challenging to determine how selection varies across contexts. In this article, we use large language models to extract granular and validated data on about 100,000 articles published in over 150 political science journals from 2010 to 2024. We show that fewer than 2% of articles that rely on statistical methods report null-only findings in their abstracts, while over 90% of papers highlight significant results. To put these findings in perspective, we develop and calibrate a simple model of publication bias. Across a range of plausible assumptions, we find that statistically significant results are estimated to be one to two orders of magnitude more likely to enter the published record than null results. Leveraging metadata extracted from individual articles, we show that the pattern of strong SoS holds across subfields, journals, methods, and time periods. However, a few factors such as pre-registration and randomized experiments correlate with greater acceptance of null results. We conclude by discussing implications for the field and the potential of our new dataset for investigating other questions about political science.

I have a new paper. We look at ~all stats articles in political science post-2010 & show that 94% have abstracts that claim to reject a null. Only 2% present only null results. This is hard to explain unless the research process has a filter that only lets rejections through.

2 months ago 644 222 30 52
Preview
How “95%” escaped into the world – and why so many believed it Challenging sloppy thinking

@benmtappin.bsky.social I just was pointed to this which is much more thorough and arrives at the same conclusion: www.exponentialview.co/p/how-95-esc...

The "95% fail" number is essentially meaningless

2 months ago 1 1 0 0

Felix and friends looking closely at the details so you don’t have to 👌👇

2 months ago 2 0 1 0
Post image

A short note on questionable AI studies or why friends don’t let friends make %-claims based on small-n qualitative research interview reports

New week, new AI newsletter from Marina and myself here at RISJ: buff.ly/ckaUSn9

2 months ago 5 4 1 3

Excited to dig into this! Thanks for the work Luc and team. Quick question: what’s happening with the y axis labels in figure 2 (0-20-80-60 etc.)? At first I thought I was misunderstanding something about your measurement, but I can’t see where. Are they just typos or what?

2 months ago 0 0 1 0