Posts by Filip Dechterenko
When I sit down to write I'm not consciously pulling from my notes, I'm drawing on the 6 months I spent reading biographies, watching hours of interviews, etc. There is a volume of research, of effort & time that is required to create something meaningful to draw from.
Will you incorporate LLMs and AI prompting into the course in the future? No. Why won’t you incorporate LLMs and AI prompting into the course? These tools are useful for coding (see this for my personal take on this). However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.
In that post, it warns that you cannot use it as a beginner: …to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability. There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability. The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma. This isn’t a form of programming hazing, like “I had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.
This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too): Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that “won’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.) It’s hard, but struggling is the only way to learn anything.
You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy. As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible: To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use. Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled. So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting. You’ve got to learn first.
Some closing thoughts for my students this semester on LLMs and learning #rstats datavizf25.classes.andrewheiss.com/news/2025-12...
It’s not too late to apply for the PhD position in my lab! Please send your documents (cover letter, CV, transcripts, names of references) through the official application platform by Nov 25!
The advances we've made in statistics, experimental study design, and causal inference over the past century are remarkably useful for understanding our world. But there is never been a push to make people use them like we are seeing with generative AI. Perhaps take a moment to consider why.
Check our new Psych Science paper w/Daniil Azarov & Daniil Grigorev. Although an ability to recognize a familiar object among new ones clearly depends on how many and which objects there are, we show a remarkable stability of underlying "representational spaces"
journals.sagepub.com/doi/10.1177/...
Fitting a generalized mixed model with a gamma distribution log link and random slopes to reaction time data to arrive at precisely the same point estimate as the authors did by simply averaging and conducting a t-test:
A new use of the asterisk in the paper author list for credit assignment
Psych-DS is (1) spellcheck for your datasets and (2) a pathway to standardizing data in our academic fields that *everyone* can learn.
And it's live RIGHT NOW!
psych-ds.github.io
(This is the announcement post I've been leading up to)
Nice tutorial on how to do signal detection analyses in R with the brms package
@matti.vuorre.com
osf.io/preprints/ps...
Screenshot of the linked Quarto website, with input checkboxes to change different conditions for a regression model that predicts economic performance based on US political party, with a reported p-value
I’ve long used FiveThirtyEight’s interactive “Hack Your Way To Scientific Glory” to illustrate the idea of p-hacking when I teach statistics. But ABC/Disney killed the site earlier this month :(
So I made my own with #rstats and Observable and #QuartoPub ! stats.andrewheiss.com/hack-your-way/
With a heavy heart, I've decided to suspend all academic travel to the USA for me and my lab. Given the escalation of tensions and uncertainties, it seems to be the wisest move. To our US colleagues, please be certain that we will continue to do what we can to support you. Science is global.
I really liked this idea of using a histogram as a legend in a choropleth map (since land isn't unemployed; people are), so I made a little guide to doing it with #rstats, {ggplot2}, and {patchwork}
www.andrewheiss.com/blog/2025/02...
New blog up: solomonkurz.netlify.app/blog/2025-02...
This time I dip my toes into causal inference for quasi-experiments using matching methods, and my use case has missing data complications. Many thanks to @dingdingpeng.the100.ci and
@noahgreifer.bsky.social
for their peer review! #RStats
🚨 I am soft Launching a full stable version of Ridian, which brings R to Obsidian, check out the website, and download the plugin from the Obsidian app or plugin website. #rstats #quartopub #obsidian
Causal methods peeps. Can you point me to a good intro reading on DAGs? Something more easily digestible than Pearl's primary papers but more technical than the kinds of 30,000-ft summaries you get from a Google search.
Great, this will be my first time to escop as well
Folks who use #rstats with github, how am I supposed to be managing the data for my project with 100mb file size limit? Am I going about this all wrong?
It was really great talk!
New blog post! Read about Posit's new Positron editor, see some of the neat new features it has, and check out the settings and extensions I use. It includes a bonus workaround for connecting to a remote server with SSH! #rstats
Our new paper - ChatGPT improves creativity, boosts self-efficacy, and makes problem-solving tasks easier and requiring less mental effort.
Magic link to pass through authors.elsevier.com/a/1iqFi1Hucd...
Before we calculate these different treatment effects with the realized outcomes instead of the hypothetical potential outcomes, let’s look really quick at the practical difference between the true ATE, ATT, and ATU. All three estimands are useful for policymaking! The ATE is −15, implying that mosquito nets cause a 15 point reduction in malaria risk for every person in the country. This includes people who live at high elevations where mosquitoes don’t live, people who live near mosquito-infested swamps, people who are rich enough to buy Bill Gates’s mosquito laser, and people who can’t afford a net but would really like to use one. If we worked in the Ministry of Health and wanted to know if we should make a new national program that gave everyone a free bed net, the overall reduction in risk is −15, which is probably pretty good! The ATT is −16.29, which is bigger than the ATE. The effect of net usage is bigger for people who are already using the nets. This is because of underlyi
Mirrored histogram showing “weird” parts of the population: treated people who were unlikely to be treated, and untreated people who were likely to be treated
Mirrored histogram showing pseudo-populations of treated and untreated people that have been reweighted to be more comparable and unconfounded
New blog post! Have you (like me!) wondered what the ATT means and how it's different from average treatment effects? I use #rstats to explore why we care about the ATE, ATT, and ATU + how to calculate them with observational data! #polisky #episky #econsky www.andrewheiss.com/blog/2024/03...
Title page for "The effects of more informative grading on student outcomes" in the Journal of Economic Behavior and Organization, with this abstract: More granular grading scales provide a more accurate assessment of achievement and thus provide students with more informative feedback on their performance. Using Swedish administrative data and exploiting a natural experiment, we identify the effects of moving from a system with three passing grades to one with five passing grades. Students receiving more informative grades are less likely to graduate from high school, from academic high school tracks, and from STEM and art high school tracks. Affected students are also less likely to enrol in STEM courses at university. The evidence suggests discouragement as a likely mechanism, with students revising their self-belief downward when receiving more informative feedback.
My own grading system details: Problem sets To practice writing R code, running inferential models, and thinking about causation, you will complete a series of problem sets. You need to show that you made a good faith effort to work each question. I will not grade these in detail. The problem sets will be graded using a check system: ✔+: (33 points (110%) in gradebook) Assignment is 100% completed. Every question was attempted and answered, and most answers are correct. Document is clean and easy to follow. Work is exceptional. I will not assign these often. ✔: (30 points (100%) in gradebook) Assignment is 70–99% complete and most answers are correct. This is the expected level of performance. ✔−: (15 points (50%) in gradebook) Assignment is less than 70% complete and/or most answers are incorrect. This indicates that you need to improve next time. I will hopefully not asisgn these often.
This is a really neat paper that argues that more detailed grading systems (e.g, A–F) are *worse* for student motivations and outcomes than more simpler ones doi.org/10.1016/j.je...
It tracks with my own check-based grading system (✓, ✓+, and ✓−), and now I have more evidence backing that up :)
Rel = 1 - (SEE^2)/V_T
New post on estimating the reliability of parameters in multilevel models. There's an easy solution using the standard errors of your shrunk parameters. It feels kind of obvious, but maybe it isn't sufficiently obvious yet.
rubenarslan.github.io/posts/2024-0...
Categorical and continuous reproductions in memory
'Noisy and hierarchical visual memory across timescales', a new Review by Timothy F. Brady (@timbrady.bsky.social), Maria M. Robinson & Jamal R. Williams (@jamalamal.bsky.social)
Web: go.nature.com/42Bhac0
PDF: rdcu.be/dyb6G
#psychology #psychscisky #cogpsyc
Want to learn how to use docker for reproducible data science with R / RStudio, but not sure where to start? I just re-recorded a recent workshop talk www.youtube.com/watch?v=uvbb... #rstats #statistics #psychology #docker
Using deep neural networks to disentangle visual and semantic information in human perception and memory
www.nature.com/articles/s41...
Spatial representations of natural scene images generalize across individuals, tasks, and viewing time: http://osf.io/243xw/