How do you align AI in a world of plural, conflicting, and evolving human values?
A starting point is human society itself.
@sydneylevine.bsky.social and I are hiring a postdoc at NYU to combine insights from cultural evolution, computational moral cognition, and AI safety.
Please share widely!1/
Posts by Ben Tappin
It’s a little known fact that DAG stands for Dynamically Adjusted Gaslighting
Giving vibes
I wonder how these rates of sycophancy (and their effects) compare against realistic counterfactuals like talking with one’s close friends or spouse.
📣 New job at Oxford's Centre for Advanced Social Science Methods (CASSM)! 📣
The Departments of Politics & IR (DPIR) and Social Policy and Intervention (DSPI) are hiring an Associate Professor of Causal and Experimental Methods.
Come work with me and amazing Oxford peeps! Deadline NOON April 27th.
Students skipping this lecture Not At Random
The synth backing track takes the vibe to an unexpected place 🥲
Terminator (missing data) stalking a scared child (researchers).
Closing out the teaching semester this week with one of my favourite topics. Sadly its pedagogy was forever and unforgivably mar'd by the truly worst naming convention of all time...
"The book of y tho" by Judea Pearl
I am impressed by your ability to successfully operationalize an integer scale for your feelings even though there is no true scale!
Ahh sorry I understand! (Clearly my powers of understanding leave a lot to be desired.) I agree that if true this would be substantively interesting in addition to being predictively useful. Your suggestion reminds me of this paper, an instant classic for various reasons www.pnas.org/doi/full/10....
I didn’t understand your comment at first—until I realised maybe you’re interpreting CV as curriculum vitae; I meant cross validation! 😅
Yes. Then if you push on this the claim becomes “okay it may not cause but it’s still useful because it predicts”. But then it’s like if predictive accuracy was your goal the design and analysis should be different e.g., CV + more predictors. I’ve been fully Westfall & Yarkoni-pilled on this point.
Sydney named its socio-economic divide the "latte line" and has been arguing about who drew it for 20 years. London has the same divide and calls it "character." I built a machine learning model to do the impolite thing: draw it and blame someone.
open.substack.com/pub/laurenle...
someone at the pentagon frantically typing “Claude, open the strait of Hormuz for me, quickest possible strategy, make no mistakes.”
Your Claude is getting more interesting chat than mine (sorry Ben’s Claude!).
“Claude, we’re going to refactor <function name here>.R to speed it up. Please analyze bottlenecks and make a plan.” x147
The character of Per’s posts increasingly remind me of when an LLM talks to itself (not derogatory)
The idea of the need for a control group may have been radical at the time, but trust me it's now generally accepted. This is not some fringe believe by methodological hardliners; it's established practice.
www.clinicaltrialsabundance.blog/p/clinical-t...
New newspaper headline for your Intro to Causal Inference lecture just dropped
“I don’t like tibbles”, “pipes are overrated”, “loops are great actually”, … what’s next? Ah yes — the R contrarian character arc 😄 If we continue the escalation, the next stages usually look like: • “Actually, base R plotting is more transparent.” • “data.table was right all along.” • “I write my own S3 methods for fun.” • “Vectorization is just premature optimization avoidance.” • “apply() is perfectly readable.” • “Why depend on 40 packages for a left join?” • “I don’t trust NSE.” • “Magrittr changed evaluation semantics and nobody talks about it.” • “One well-written for-loop beats clever code.” • “I opened the tidyverse source code once and never recovered.”
Planning my next conversation starter
I too was very glad to see this! But I feel like the whole episode bodes badly for the future. It’s not sustainable to rely on the CEO of a private company to act against their financial self-interest in order to curtail high-risk AI deployment (here mass surveillance and fully autonomous weapons).
🔔 “How real is the LLM threat to online research in academia?” will be live today.
Experts from Microsoft Research, MIT / Stanford, Max Planck Institute, and Prolific discuss the threat of agentic AI to online research, and how to protect against it.
Link to join live below. #AcademicSky #Research
Reposting for visibility. Many researchers still appear oblivious to this fact, which is terrifying! It should be included in every experiment design 101.
When I pitch academics on my paper on nulls one common and understandable reaction is "but they're probably noisy and thus uninformative nulls." This is true, but it misses the key realization that WE PUBLISH THE RESULT WHEN THE NOISY TEST IS P<0.05.
It must be very hard to publish null results Publication practices in the social sciences act as a filter that favors statistically significant results over null findings. While the problem of selection on significance (SoS) is well-known in theory, it has been difficult to measure its scope empirically, and it has been challenging to determine how selection varies across contexts. In this article, we use large language models to extract granular and validated data on about 100,000 articles published in over 150 political science journals from 2010 to 2024. We show that fewer than 2% of articles that rely on statistical methods report null-only findings in their abstracts, while over 90% of papers highlight significant results. To put these findings in perspective, we develop and calibrate a simple model of publication bias. Across a range of plausible assumptions, we find that statistically significant results are estimated to be one to two orders of magnitude more likely to enter the published record than null results. Leveraging metadata extracted from individual articles, we show that the pattern of strong SoS holds across subfields, journals, methods, and time periods. However, a few factors such as pre-registration and randomized experiments correlate with greater acceptance of null results. We conclude by discussing implications for the field and the potential of our new dataset for investigating other questions about political science.
I have a new paper. We look at ~all stats articles in political science post-2010 & show that 94% have abstracts that claim to reject a null. Only 2% present only null results. This is hard to explain unless the research process has a filter that only lets rejections through.
@benmtappin.bsky.social I just was pointed to this which is much more thorough and arrives at the same conclusion: www.exponentialview.co/p/how-95-esc...
The "95% fail" number is essentially meaningless
Felix and friends looking closely at the details so you don’t have to 👌👇
A short note on questionable AI studies or why friends don’t let friends make %-claims based on small-n qualitative research interview reports
New week, new AI newsletter from Marina and myself here at RISJ: buff.ly/ckaUSn9
Excited to dig into this! Thanks for the work Luc and team. Quick question: what’s happening with the y axis labels in figure 2 (0-20-80-60 etc.)? At first I thought I was misunderstanding something about your measurement, but I can’t see where. Are they just typos or what?