I’m in The Times today talking about how we judge probability-based language and what happens when words mean different things to different people.
This follows an online quiz I’ve been running at probability.kucharski.io over the past few weeks, with 5000+ participants and counting.
Posts by Dr. Matias Valdenegro
Every American needs to watch this:
Adri\'an Detavernier, Jasper De Bock: Robustness quantification and how it allows for reliable classification, even in the presence of distribution shift and for small training sets https://arxiv.org/abs/2503.22418 https://arxiv.org/pdf/2503.22418 https://arxiv.org/html/2503.22418
📊 Class 3 of my online course is now live! This week we tackle scientific literacy—how to read research papers, evaluate expert credentials, and spot the red flags in misleading health headlines.
matthewfacciani.substack.com/p/class-3-ho...
Hack the planet!
No, you did not give those of us who happened to look like the people who bombed Pearl Harbor any due process. And that was profoundly wrong. It destroyed our lives.
Print screen of the first page of a paper pre-print titled "Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor" by Olteanu et al. Paper abstract: "In AI research and practice, rigor remains largely understood in terms of methodological rigor -- such as whether mathematical, statistical, or computational methods are correctly applied. We argue that this narrow conception of rigor has contributed to the concerns raised by the responsible AI community, including overblown claims about AI capabilities. Our position is that a broader conception of what rigorous AI research and practice should entail is needed. We believe such a conception -- in addition to a more expansive understanding of (1) methodological rigor -- should include aspects related to (2) what background knowledge informs what to work on (epistemic rigor); (3) how disciplinary, community, or personal norms, standards, or beliefs influence the work (normative rigor); (4) how clearly articulated the theoretical constructs under use are (conceptual rigor); (5) what is reported and how (reporting rigor); and (6) how well-supported the inferences from existing evidence are (interpretative rigor). In doing so, we also aim to provide useful language and a framework for much-needed dialogue about the AI community's work by researchers, policymakers, journalists, and other stakeholders."
We have to talk about rigor in AI work and what it should entail. The reality is that impoverished notions of rigor do not only lead to some one-off undesirable outcomes but can have a deeply formative impact on the scientific integrity and quality of both AI research and practice 1/
We despise immigrants for not putting down roots, even as we make sure that it is impossible for them to do so. We do this because we have no idea what we want.
open.substack.com/pub/iandunt/...
I'm embarrassed for the New York Times that they published this piece on Ms. Rachel, in which they cite a ridiculous anonymous righwing website Stopantisemitism while indulging the mad, mad claim she may be funded by Hamas (!).
This isn't journalism:
Just out! Our peer-reviewed critique of the Cass Review has been published by BMC Medical Research Methodology. Please read and share. We show that the Cass Review is fatally flawed and should not be the basis for policy or practice in transgender healthcare.
link.springer.com/article/10.1...
Aleatoric and epistemic uncertainty are clear-cut concepts, right? ... right? 😵💫 In our new ICLR blogpost we let different schools of thought speak and contradict each other, and revisit chatbots where “the character of aleatory ‘transforms’ into epistemic” iclr-blogposts.github.io/2025/blog/re...
@bagleycartoons.bsky.social
I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them
I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...
When an AI model for code-editing company Cursor hallucinated a new rule, users revolted. www.wired.com/story/cursor...
Even accepting the premise that AI produces useful writing (which no one should), using AI in education is like using a forklift at the gym. The weights do not actually need to be moved from place to place. That is not the work. The work is what happens within you.
A tweet by Sarah Longwell (@SarahLongwell25 ) reads: "He’s threatening media companies who are critical of him. He’s talking about sending Americans to foreign prisons. He’s signing executive orders to investigate former staff members who spoke out against him. Don’t you see what’s happening here?"
I see it. I have lived it. 83 years ago, the U.S. government turned upon a group of its own citizens and residents and sent them to internment camps without due process. I was there among them. American fascism is back. It is here. It is now.
So I am leading this group building great teaching materials for scientific rigor (c4r.io). Their first unit is really coming together and I will teach it (Monday, April 21, 2025, 12:00 -1:00pm EST) to see how well it works. Join us: forms.monday.com/forms/7d978e...
I've really enjoyed reading this "workography" by Kees van Deemter, whom I've never met but who has had a long career in NLP. Lots of storytelling and reflections on research, moving between institutions and countries, finding mentors, choosing between academia and industry, and more.
This study introduces a method for calibrating certainty expressions, transforming phrases like "Maybe" into probability distributions. This enhances decision-making for radiologists and fine-tunes AI models, improving uncertainty communication. https://arxiv.org/abs/2410.04315
How to Leverage Predictive Uncertainty Estimates for Reducing Catastrophic Forgetting in Online C...
Giuseppe Serra, Ben Werner, Florian Buettner
Action editor: Emmanuel Bengio
https://openreview.net/forum?id=dczXe0S1oL
#forgetting #memory #forget
March 31st is Trans Day of Visibility.
Enjoying this game very much!
despite popularised beliefs, LLMs are not fit for medical applications. SoTA models produce "non-trivial levels of hallucinations" even w inference techniques like CoT & search augmented generation: arxiv.org/pdf/2503.05777
of surveyed clinicians, 53% use LLMs daily & 91% encountered hallucinations
While reading Ben Recht's article, I found Foster & Hart (2021) (arxiv.org/abs/2210.07169) quite interesting. The contribution is a proposal of always-calibrated forecaster based on a continuously-relaxed calibration measure. But I actually love their §1.1 motivating calibration.
📣 New paper! The field of AI research is increasingly realising that benchmarks are very limited in what they can tell us about AI system performance and safety. We argue and lay out a roadmap toward a *science of AI evaluation*: arxiv.org/abs/2503.05336 🧵
91% of medical professionals using LLMs have encountered hallucinations and 84% believe they could impact patient health arxiv.org/abs/2503.05777
"Germany Tried to Silence Me, a UN Official, for Talking About Israel’s Genocidal War in Gaza"
In an exclusive piece for Zeteo, UN Special Rapporteur Francesca Albanese writes about her 5-day trip that exposed Germany's harsh deviation from democratic values:
Docs is an open source collaborative text editor created by a joint effort from the French 🇫🇷 and German 🇩🇪governments.
github.com/suitenumeriq...