Advertisement · 728 × 90

Posts by Ray Bai

Post image Post image Post image Post image

Lovely time visiting U. of South Carolina for the USC Department of Statistics 40th Anniversary conference! During my visit, I also got to enjoy Korean bbq at MOA with my 2 USC PhD students Sijian and Fanghua, and have dinner with the legendary statistician/data scientist Bin Yu from UC Berkeley!

3 weeks ago 1 0 0 0
Post image

Visiting the University of South Carolina this week, and my third PhD student Sijian Fan successfully defended his PhD dissertation, "Statistical Learning for Binary Data with Applications in Bioinformatics"! Excellent job! #professor #proudadvisor #universityofsouthcarolina

3 weeks ago 1 0 0 0
Post image

Excited for my PhD student Sijian's final defense this upcoming Wednesday! It's taking place at 11:00 am EST on Mar. 25 in LeConte 229 at the University of South Carolina! All who are free and on USC's campus at that time are welcome to join!

4 weeks ago 0 0 0 0
Post image

Another new preprint! "BVSIMC: Bayesian variable selection-guided inductive matrix completion for improved and interpretable drug discovery" (with my PhD student Sijian Fan and our co-authors Liyan Xiong, Dayuan Wang, and Guoshuai Cai).

Check it out 👉 arxiv.org/abs/2603.18957

1 month ago 0 0 0 0
Post image

New preprint of paper with my PhD student Sijian Fan: "BiSSLB: Binary Spike-and-Slab Lasso Biclustering"! We developed a biclustering method for binary data. It works really well on particularly noisy data with many 0's in 1-filled patterns or isolated 1's. arxiv.org/abs/2603.18378

1 month ago 0 0 0 0

I'll be in Columbia, SC next week starting on Wed. for my PhD student Sijian's final dissertation defense and for @uofscstatistics.bsky.social's 40th anniversary conference! Looking forward to seeing old colleagues at USC!

1 month ago 0 0 0 0

Yasss Amy Madigan! Well-deserved Oscar for her incredible performance as Gladys in "Weapons"!

1 month ago 0 1 0 0

I am pleased to share that my paper "VCBART: Bayesian Trees for Varying Coefficients" (with Sameer Deshpande, Cecilia Balocchi, Jennifer Sterling, and Jordan Weiss) has been published in the latest issue of Bayesian Analysis!

Read it here: doi.org/10.1214/24-B...

1 month ago 4 0 0 0
Seven Major Directions and Trends in Modern Statistics – Ray Bai

New blog post: "Seven Major Directions and Trends in Modern Statistics"! In this post, I summarize a few of the latest trends and prominent areas in the field of statistics.

raybai.net/seven-major-...

1 month ago 0 0 0 0

I often explain deep learning and DGMs to non-experts & students who are not familiar but are interested in exploring this area. I find it's very helpful to start by framing linear regression and logistic regression as special cases of neural networks with a single output layer.

1 month ago 1 0 0 0
Advertisement

Happening tomorrow at UMBC! Excited for my visit

2 months ago 0 0 0 0

(2/2) Never thought of myself as much of a probabilist either, but my recent work on DGMs delved into functional inequalities in probability theory to characterize transport maps. You just never know when these things will pop up or when you'll use them!

2 months ago 0 0 0 0

(1/2) It's always a bit wild to me when something I learned many years ago comes up again. I wasn't sure I'd ever use differential equations again, but now with flow matching and diffusion models being the current state-of-the-art generative models, I'm reviewing a bit of ODEs.

2 months ago 3 0 1 0
Post image

A bit late but group pic from the Maryland Statistics Symposium at Brinn Mathematics Research Center this past Dec! Left to right: Jianhui Zhou, Gemma Moran, Lizhen Lin, Alden Green, Ray Bai, Anindya Roy, Cindy Rush, Yubai Yuan, Yun Yang, Yang Feng, Anderson Ye Zhang, Yanyuan Ma

2 months ago 0 0 0 0

I hope Sinners wins the Academy Award for Best Picture this year. Not just because it was an incredible movie, but because as a longtime horror aficionado, this would signal a broader appetite for horror & other genre-bending films in the academy (justice for Get Out!).

2 months ago 0 0 0 0
Post image

I'm giving a talk "Deep Generative Models for Statistical Problems: Methods, Computation, and Theory" at the UMBC Mathematics and Statistics Dept next Friday, Feb. 20 from 11:00 am-12:00 pm! Come join if you're in the area. mathstat.umbc.edu/events/event...

2 months ago 2 0 0 1

😍

2 months ago 0 0 0 0

This Super Bowl game is fairly boring, but absolutely loved the Halftime Show and the other musical performances! Green Day, Lady Gaga, Bad Bunny ❤️❤️

2 months ago 0 0 0 0

Congrats to my collaborator and former student Qingyang Liu (I taught him in 2 classes, served on his dissertation committee, and have co-authored several papers with him)! He will be joining @wakeforeststats.bsky.social as an Assistant Professor in July. 🥳Great department!

2 months ago 0 0 0 0
Preview
Open-Rank, Tenured/Tenure-Track Statistics Faculty - Fairfax, VA, Virginia, United States Department: Col of Engineering and Computing Classification: 9-month Instructional Faculty Job Category: Instructional Faculty Job Type: Full-Time Work Schedule: Full-time (1.0 FTE, 40 hrs/wk) Locatio...

To anyone who is on the job market in Statistics this academic year: the George Mason University (GMU) Department of Statistics is hiring for open-rank, tenure-track or tenured positions!

For full consideration, apply by January 14 at this link: tinyurl.com/6mjs8fye

3 months ago 1 3 0 0
Advertisement

So maddening what happened at the University of Nebraska-Lincoln

magazine.amstat.org/blog/2026/01...

3 months ago 0 0 0 0
Post image

Our paper "Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression" was published in the most recent issue of Computational Statistics. Read the paper here: doi.org/10.1007/s001...

3 months ago 1 0 0 0
Will you incorporate LLMs and AI prompting into the course in the future?
No.

Why won’t you incorporate LLMs and AI prompting into the course?
These tools are useful for coding (see this for my personal take on this).

However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

Will you incorporate LLMs and AI prompting into the course in the future? No. Why won’t you incorporate LLMs and AI prompting into the course? These tools are useful for coding (see this for my personal take on this). However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

In that post, it warns that you cannot use it as a beginner:

…to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability.

There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability.

The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma.

This isn’t a form of programming hazing, like “I had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.

In that post, it warns that you cannot use it as a beginner: …to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability. There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability. The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma. This isn’t a form of programming hazing, like “I had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.

This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too):

Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that “won’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.)

It’s hard, but struggling is the only way to learn anything.

This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too): Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that “won’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.) It’s hard, but struggling is the only way to learn anything.

You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy.

As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible:

To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use.

Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled.

So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting.

You’ve got to learn first.

You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy. As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible: To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use. Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled. So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting. You’ve got to learn first.

Some closing thoughts for my students this semester on LLMs and learning #rstats datavizf25.classes.andrewheiss.com/news/2025-12...

4 months ago 331 99 14 31
Exam question: After you have explained 97% confidence to Bob, he responds, "I see. 97% is pretty good, but it could be great if we can make a 100% confidence interval." What is your response to this?

Student's answer: "Bob, you are a fool amongst fools. Truly, I pity you. A 100% confidence interval would be useful as it would give us a result of all real numbers. Taht's the only way to be 100% sure our true mean is in the interval; if every number could be included."

Exam question: After you have explained 97% confidence to Bob, he responds, "I see. 97% is pretty good, but it could be great if we can make a 100% confidence interval." What is your response to this? Student's answer: "Bob, you are a fool amongst fools. Truly, I pity you. A 100% confidence interval would be useful as it would give us a result of all real numbers. Taht's the only way to be 100% sure our true mean is in the interval; if every number could be included."

Grading my final exams for undergrad probability & statistics, and this response to one of my questions seriously made me laugh out loud for minutes. Should I give Extra Credit for the student's response? "Bob, you are a fool amongst fools." 😂😂😂

4 months ago 1 0 0 0

Our R package for VCBART, or fitting BART-based varying coefficient models, is now available on CRAN! Useful for flexible regression modeling + can be used to estimate heterogeneous treatment effects in causal inference by specifying X and Z appropriately. Check it cran.r-project.org/web/packages...

4 months ago 1 0 0 0
Preview
Colleges Are Preparing to Self-Lobotomize The skills that students will need in an age of automation are precisely those that are eroded by inserting AI into the educational process.

Yes. "... the skills that future graduates will most need in the AI era—creative thinking, the capacity to learn new things, flexible modes of analysis—are precisely those that are likely to be eroded by inserting AI into the educational process."

4 months ago 1 0 0 0
Preview
Yesterday was a very sad day for all of Statistics: The University of Nebraska Board of Regents voted 9-1 (with 2 abstentions) to eliminate its Department of Statistics (https://lnkd.in/gJzJ_yki)… | C... Yesterday was a very sad day for all of Statistics: The University of Nebraska Board of Regents voted 9-1 (with 2 abstentions) to eliminate its Department of Statistics (https://lnkd.in/gJzJ_yki). The...

A sad day for the statistics community. U. of Nebraska Board of Regents voted to eliminate UNL's Department of Statistics.

4 months ago 1 0 0 0
Advertisement
Post image Post image

I'm at the Brin Mathematics Research Center today for the Maryland Statistics Symposium! Presenting my work on generative quantile regression w/ former PhD student Dr. Shijie Wang (U. South Carolina '24) and Dr. Minsuk Shin of Yonsei U. (published in JCGS last year).

4 months ago 1 0 0 0
Maryland Statistical Symposium | Brin Mathematics Research Center

The Maryland Statistical Symposium looks awesome! brinmrc.umd.edu/fall25-mss/

So honored to be invited to speak at this event alongside many outstanding researchers, some of whose work I have followed and admired for years!

4 months ago 1 1 0 0
University of Nebraska-Lincoln Department of Statistics seminar "The Metrics" on November 6, 2025
University of Nebraska-Lincoln Department of Statistics seminar "The Metrics" on November 6, 2025 YouTube video by Chris Bilder

If you're following the #UNL #statistics saga (proposed for elimination based on bad stats), you might find the seminar we gave yesterday interesting... youtu.be/fUk2R0UYWpA

It was weird to rail against someone for an hour, but strangely cathartic, and the #datavis seems to have been effective?

5 months ago 9 8 1 1