Advertisement · 728 × 90

Posts by Winston Lin

A screenshot showing:
Introduction

These are notes for my class on probability models. In these notes, I walk through the concepts and computation that support modern probability modeling in political science using both maximum likelihood and Bayesian approaches.

The Goal

There are many excellent books on probability models. But I felt the need to write my own. Why? I saw three problems.

First, some classes assign a huge textbook. It might be possible for the strongest and most motivated students to become familiar with the range of topics covered in these textbook, but impossible to master. Instead, these textbooks seem like references, something you’re supposed to constantly be referring back to throughout your career. I know this because many of these books have instructors’ guides that suggest what should be covered in a single semester, what should be skipped, and how one might jump around. Instead, I want a book that students can work through beginning to end and master each idea.
Second, some classes assign a variety of sections from several books and a collection of articles. But then the story told in the readings isn’t coherent. The styles are changing, the author’s tastes are changing, and the notation is changing. Switching among authors can feel like whiplash when learning a difficult subject. Instead, I want a book that tells a continuous story with consistent style, tastes, and notation.
Third, some classes assign readings that support the lecture material, without exact alignment between the two. For better or worse, the content covered by the instructor in class feels like the most important material. Thus, I want a book that exactly aligns with the material I cover in class.

A screenshot showing: Introduction These are notes for my class on probability models. In these notes, I walk through the concepts and computation that support modern probability modeling in political science using both maximum likelihood and Bayesian approaches. The Goal There are many excellent books on probability models. But I felt the need to write my own. Why? I saw three problems. First, some classes assign a huge textbook. It might be possible for the strongest and most motivated students to become familiar with the range of topics covered in these textbook, but impossible to master. Instead, these textbooks seem like references, something you’re supposed to constantly be referring back to throughout your career. I know this because many of these books have instructors’ guides that suggest what should be covered in a single semester, what should be skipped, and how one might jump around. Instead, I want a book that students can work through beginning to end and master each idea. Second, some classes assign a variety of sections from several books and a collection of articles. But then the story told in the readings isn’t coherent. The styles are changing, the author’s tastes are changing, and the notation is changing. Switching among authors can feel like whiplash when learning a difficult subject. Instead, I want a book that tells a continuous story with consistent style, tastes, and notation. Third, some classes assign readings that support the lecture material, without exact alignment between the two. For better or worse, the content covered by the instructor in class feels like the most important material. Thus, I want a book that exactly aligns with the material I cover in class.

You guys @carlislerainey.bsky.social has a free textbook online and it seems really useful pos5747.github.io/notes/

2 weeks ago 45 20 0 2

I came across this while trying to refresh my memory about another bilingual joke! A grad school officemate from Montreal told me that at his college, students said something like "Je m'en fiche de la loi de Poisson" or "La loi de Poisson je m'en fiche" 😀

2 weeks ago 1 0 1 0
Screenshot from "Loi de Poisson" (French Wikipedia article). It says, "Ne doit pas être confondu avec Loi de Fisher."

Screenshot from "Loi de Poisson" (French Wikipedia article). It says, "Ne doit pas être confondu avec Loi de Fisher."

Bilingual joke? French Wikipedia says the Poisson distribution is "not to be confused with Fisher's distribution" (the F-distribution)

fr.wikipedia.org/wiki/Loi_de_...

3 weeks ago 45 9 0 1
Preview
Observation and Experiment — Harvard University Press A daily glass of wine prolongs life—yet alcohol can cause life-threatening cancer. Some say raising the minimum wage will decrease inequality while others say it increases unemployment. Scientists onc...

Rosenbaum's Observation and Experiment is great too. I have sadly not read his more technical books yet.

www.hup.harvard.edu/books/978067...

3 weeks ago 9 1 0 0

3) All that's for RCTs. For observational studies, the issues are different, and here's a link to an old favorite paper

bsky.app/profile/lins...

4 weeks ago 3 0 1 0

2) Freedman 2008 showed ANCOVA I may have a finite-sample bias. Also true of ANCOVA II. Difference-in-means and difference-in-differences (change scores) are exactly unbiased. Gerber & Green (Field Experiments, p. 104) suggest diff-in-diff with N < 20. Simulations with real data could be useful

4 weeks ago 3 0 1 0
Efficiency Study of Estimators for a Treatment Effect in a Pretest-Posttest Trial on JSTOR Li Yang, Anastasios A. Tsiatis, Efficiency Study of Estimators for a Treatment Effect in a Pretest-Posttest Trial, The American Statistician, Vol. 55, No. 4 (Nov., 2001), pp. 314-321

1) Yang & Tsiatis 2001 proved (a) "ANCOVA II" (which I later studied in my 2013 finite-population paper) is asymptotically more efficient than 1 and 3, and (b) with equal sample sizes in treatment & control, 2 ("ANCOVA I") is asymptotically equivalent to ANCOVA II

www.jstor.org/stable/2685694

4 weeks ago 2 0 1 0

Allison (1990) is helpful for intuition on this (with examples on pp. 97-100 & 109). The assumptions for change scores (diff-in-diff) are different from ANCOVA, but not stronger or weaker, so it depends on the process that determines who gets which treatment

statisticalhorizons.com/wp-content/u...

1 month ago 3 0 0 1
Advertisement
Preview
Causal inference for psychologists who think that causal inference is not for them Correlation does not imply causation and psychologists' causal inference training often focuses on the conclusion that therefore experiments are needed—without much consideration for the causal infer...

You need to bring in the same toolkit as in studies that try to establish causality without randomization.

I know it sounds unfair, but I don’t make the rules. These situations are instances of post-treatment bias, if you want to read up on it as a psychologist:

1 month ago 68 12 3 4

Speaking truth to power

2 months ago 1 1 0 0

A more user friendly
t-test
regression
variable description
frequency plots, and more.

datacolada.org/132

2 months ago 12 3 2 0

🚨SOLUTIONS🚨

Desk reject more stuff with actionable feedback.

Don’t request second reviews

Build larger editorial boards of volunteers

Wait to submit your work until it’s ready; a.k.a don’t send in your half-baked trash hoping for feedback

6/7

2 months ago 9 4 2 0

I’ve thanked people for spending the time to give me comments

2 months ago 1 0 0 0

After years in academia, I’m exploring data science and research roles in industry.

I'm a quant. social scientist (PhD Yale ’24, NYU) focused on causal inference, experiments, and large-scale data.

Feel free to get in touch or share; all leads appreciated. dwstommes@gmail.com

2 months ago 31 20 0 0
We believe that to improve practices, some fundamental rethinking of what we consider a publishable scientific contribution may be necessary. Currently, researchers may feel pressured to do “everything” in a single article—summarize and synthesize the existing literature, suggest a new theory or at least modify an existing one, hypothesize moderation and/or mediation, and provide (preferably positive) empirical evidence through statistical analyses that they run themselves, maybe even across multiple studies they conducted themselves. It is perhaps unsurprising that they end up cutting corners when it comes to causal inference—a hard topic, for which psychologists often receive little training—and rely on out-of-the-box statistical models.

We believe that to improve practices, some fundamental rethinking of what we consider a publishable scientific contribution may be necessary. Currently, researchers may feel pressured to do “everything” in a single article—summarize and synthesize the existing literature, suggest a new theory or at least modify an existing one, hypothesize moderation and/or mediation, and provide (preferably positive) empirical evidence through statistical analyses that they run themselves, maybe even across multiple studies they conducted themselves. It is perhaps unsurprising that they end up cutting corners when it comes to causal inference—a hard topic, for which psychologists often receive little training—and rely on out-of-the-box statistical models.

This quote also reminds me of something that we wrote in our paper on path analysis (journals.sagepub.com/doi/10.1177/...). People are just expecting *way* too much of a single study, literally new discoveries exceeding Gregor Mendel's.

2 months ago 6 2 0 0
Advertisement

How about Don Campbell and his collaborators, who invented regression discontinuity among other things?

2 months ago 0 0 0 0
Preview
RegCheck RegCheck is an AI tool to compare preregistrations with papers instantly.

Comparing registrations to published papers is essential to research integrity - and almost no one does it routinely because it's slow, messy, and time-demanding.

RegCheck was built to help make this process easier.

Today, we launch RegCheck V2.

🧵

regcheck.app

2 months ago 174 90 8 6

Back in 2017-18, a friend told me that Yale SOM banned laptops in MBA classes

3 months ago 1 0 0 0
Preview
For better learning in college lectures, lay down the laptop and pick up a pen | Brookings Susan Dynarski examines the evidence that students learn better if they aren't using their laptops during lectures.

My syllabi have a footnote recommending the same 2017 @dynarski.bsky.social review that @gregsasso.bsky.social shared. This semester I also looked at Nicholas Decker's recent blog post

www.brookings.edu/articles/for...

nicholasdecker.substack.com/p/should-we-...

3 months ago 5 0 2 0
“Coding for humans: Best practices for writing software people can read” | Statistical Modeling, Causal Inference, and Social Science

“Coding for humans: Best practices for writing software people can read”
statmodeling.stat.columbia.edu/2026/01/17/c...

3 months ago 23 8 0 1

Rosenbaum, Observation and Experiment

3 months ago 2 0 0 0
Preview
Writing about technical topics in an accessible manner A wise man – I’m quite sure it was Brian Wansink – once pointed out that it is impossible to both read and write a lot. So, maybe reading a post about how to write just steals time from the more urgen...

Accessibility is *absolutely* key but also hard because of the curse of knowledge. I've written down some writing advice here: www.the100.ci/2024/12/01/w.... If you're more of a technical person, consider teaming up with a substantive researcher for instant audience access.>

3 months ago 15 2 1 0

Some people bring up (1) the cost of criticism and (2) that a lot of criticism has already been voiced but ignored. Both points are valid, so here are some suggestion for (1) reducing backlash and (2) increasing impact (from this talk of mine: juliarohrer.com/wp-content/u...

3 months ago 72 25 2 4

Citations always needed checking! Just as one example, I used to see my sole authored 2013 paper cited as “Lin et al” coz Google Scholar’s bib had an error :)

3 months ago 1 0 1 0
Advertisement
Will you incorporate LLMs and AI prompting into the course in the future?
No.

Why won’t you incorporate LLMs and AI prompting into the course?
These tools are useful for coding (see this for my personal take on this).

However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

Will you incorporate LLMs and AI prompting into the course in the future? No. Why won’t you incorporate LLMs and AI prompting into the course? These tools are useful for coding (see this for my personal take on this). However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

In that post, it warns that you cannot use it as a beginner:

…to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability.

There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability.

The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma.

This isn’t a form of programming hazing, like “I had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.

In that post, it warns that you cannot use it as a beginner: …to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability. There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability. The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma. This isn’t a form of programming hazing, like “I had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.

This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too):

Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that “won’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.)

It’s hard, but struggling is the only way to learn anything.

This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too): Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that “won’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.) It’s hard, but struggling is the only way to learn anything.

You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy.

As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible:

To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use.

Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled.

So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting.

You’ve got to learn first.

You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy. As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible: To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use. Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled. So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting. You’ve got to learn first.

Some closing thoughts for my students this semester on LLMs and learning #rstats datavizf25.classes.andrewheiss.com/news/2025-12...

4 months ago 331 99 14 31

I think of the 2019 Nobel as the 2nd wave of the experimental part of the credibility revolution. Ashenfelter, Card, & Lalonde’s work led to major job training RCTs in the US, and Angrist was one of Duflo’s advisors. Ashenfelter has a nice speech on the early history

legacy.iza.org/en/webconten...

4 months ago 3 0 1 0

Gentle reminder that a correlation coefficient isn’t a particularly great way to quantify the effect of a dichotomous treatment. See also

www.the100.ci/2025/07/28/w...

4 months ago 29 7 3 0
Preview
Interpreting p values and interval estimates based on practical relevance: guidance for the sports medicine clinician Statistical methods are employed in medical research to estimate effects of treatments or health conditions across populations.1 2 This paper presents a framework to avoid common misinterpretations th...

Excellent new editorial and guideline on interpreting p values and interval estimates
bjsm.bmj.com/content/earl...

6 months ago 21 6 1 1
from Amrhein, Greenland, & McShane ("Retire statistical significance," Nature, 2019)

"For example, the authors above could have written: ‘Like a previous study, our results suggest a 20% increase in risk of new-onset atrial fibrillation in patients given the anti-inflammatory drugs. Nonetheless, a risk difference ranging from a 3% decrease, a small negative association, to a 48% increase, a substantial positive association, is also reasonably compatible with our data, given our assumptions.’ "

from Amrhein, Greenland, & McShane ("Retire statistical significance," Nature, 2019) "For example, the authors above could have written: ‘Like a previous study, our results suggest a 20% increase in risk of new-onset atrial fibrillation in patients given the anti-inflammatory drugs. Nonetheless, a risk difference ranging from a 3% decrease, a small negative association, to a 48% increase, a substantial positive association, is also reasonably compatible with our data, given our assumptions.’ "

from Amrhein, Greenland, & McShane ("Retire statistical significance," Nature, 2019)

"Whatever the statistics show, it is fine to suggest reasons for your results, but discuss a range of potential explanations, not just favoured ones. Inferences should be scientific, and that goes far beyond the merely statistical. Factors such as background evidence, study design, data quality and understanding of underlying mechanisms are often more important than statistical measures such as P values or intervals."

from Amrhein, Greenland, & McShane ("Retire statistical significance," Nature, 2019) "Whatever the statistics show, it is fine to suggest reasons for your results, but discuss a range of potential explanations, not just favoured ones. Inferences should be scientific, and that goes far beyond the merely statistical. Factors such as background evidence, study design, data quality and understanding of underlying mechanisms are often more important than statistical measures such as P values or intervals."

I like this from @vamrhein.bsky.social et al. I assigned it to my class last semester and tried to explain that p-values measure how compatible (vs. surprising) the data are with the null, given our assumptions. But yeah, tests & CIs are hard to understand!

www.blakemcshane.com/Papers/natur...

4 months ago 6 2 0 0
Nonparametric Estimates of the Labor-Supply Effects of Negative Income Tax Programs | Journal of Labor Economics: Vol 8, No 1, Part 2 This article reports nonparametric estimates of the effect of labor-supply behavior on the payments to families enrolled in the Seattle/Denver Income Maintenance Experiment. The randomized assignment ...

Orley Ashenfelter's papers often have good introductions. Here's Ashenfelter & Plant

www.journals.uchicago.edu/doi/abs/10.1...

5 months ago 1 0 0 0