Dr. Matias Valdenegro (@mvaldenegro) Bsky

How likely is ‘almost certainly’? The scourge of weasel words Phrases to describe probability are getting lost in translation — and have helped to cause at least one military catastrophe

I’m in The Times today talking about how we judge probability-based language and what happens when words mean different things to different people.

This follows an online quiz I’ve been running at probability.kucharski.io over the past few weeks, with 5000+ participants and counting.

1 month ago 33 10 2 0

Every American needs to watch this:

2 months ago 20694 10381 592 1001

Adri\'an Detavernier, Jasper De Bock: Robustness quantification and how it allows for reliable classification, even in the presence of distribution shift and for small training sets https://arxiv.org/abs/2503.22418 https://arxiv.org/pdf/2503.22418 https://arxiv.org/html/2503.22418

1 year ago 0 2 1 1

Class 3: How to Identify Scientific Red Flags and Spot Misleading Health Headlines In this class we learn how to evaluate research quality, use Google Scholar to check expertise, and navigate common pitfalls in statistics and research methods.

📊 Class 3 of my online course is now live! This week we tackle scientific literacy—how to read research papers, evaluate expert credentials, and spot the red flags in misleading health headlines.
matthewfacciani.substack.com/p/class-3-ho...

3 months ago 14 6 1 0

Hack the planet!

7 months ago 2 3 0 0

No, you did not give those of us who happened to look like the people who bombed Pearl Harbor any due process. And that was profoundly wrong. It destroyed our lives.

8 months ago 13947 3947 601 211

Print screen of the first page of a paper pre-print titled "Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor" by Olteanu et al. Paper abstract: "In AI research and practice, rigor remains largely understood in terms of methodological rigor -- such as whether mathematical, statistical, or computational methods are correctly applied. We argue that this narrow conception of rigor has contributed to the concerns raised by the responsible AI community, including overblown claims about AI capabilities. Our position is that a broader conception of what rigorous AI research and practice should entail is needed. We believe such a conception -- in addition to a more expansive understanding of (1) methodological rigor -- should include aspects related to (2) what background knowledge informs what to work on (epistemic rigor); (3) how disciplinary, community, or personal norms, standards, or beliefs influence the work (normative rigor); (4) how clearly articulated the theoretical constructs under use are (conceptual rigor); (5) what is reported and how (reporting rigor); and (6) how well-supported the inferences from existing evidence are (interpretative rigor). In doing so, we also aim to provide useful language and a framework for much-needed dialogue about the AI community's work by researchers, policymakers, journalists, and other stakeholders."

We have to talk about rigor in AI work and what it should entail. The reality is that impoverished notions of rigor do not only lead to some one-off undesirable outcomes but can have a deeply formative impact on the scientific integrity and quality of both AI research and practice 1/

10 months ago 64 18 2 3

We despise immigrants for not putting down roots, even as we make sure that it is impossible for them to do so. We do this because we have no idea what we want.

open.substack.com/pub/iandunt/...

11 months ago 2149 553 103 39

Why Tot Celebrity Ms. Rachel Waded Into the Gaza Debate

I'm embarrassed for the New York Times that they published this piece on Ms. Rachel, in which they cite a ridiculous anonymous righwing website Stopantisemitism while indulging the mad, mad claim she may be funded by Hamas (!).

This isn't journalism:

11 months ago 3184 500 129 88

Just out! Our peer-reviewed critique of the Cass Review has been published by BMC Medical Research Methodology. Please read and share. We show that the Cass Review is fatally flawed and should not be the basis for policy or practice in transgender healthcare.

link.springer.com/article/10.1...

11 months ago 5634 2860 129 224

Aleatoric and epistemic uncertainty are clear-cut concepts, right? ... right? 😵‍💫 In our new ICLR blogpost we let different schools of thought speak and contradict each other, and revisit chatbots where “the character of aleatory ‘transforms’ into epistemic” iclr-blogposts.github.io/2025/blog/re...

11 months ago 31 9 1 0

@bagleycartoons.bsky.social

11 months ago 141 59 1 2

Tips on How to Connect at Academic Conferences I was a kinda awkward teenager. If you are a CS researcher reading this post, then chances are, you were too. How to navigate social situations and make friends is not always intuitive, and has to …

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them

I'm late for #ICLR2025 #NAACL2025, but in time for #AISTATS2025 #ICML2025! 1/3
kamathematics.wordpress.com/2025/05/01/t...

11 months ago 68 19 3 2

An AI Customer Service Chatbot Made Up a Company Policy—and Created a Mess When an AI model for code-editing company Cursor hallucinated a new rule, users revolted.

When an AI model for code-editing company Cursor hallucinated a new rule, users revolted. www.wired.com/story/cursor...

1 year ago 211 51 11 19

Even accepting the premise that AI produces useful writing (which no one should), using AI in education is like using a forklift at the gym. The weights do not actually need to be moved from place to place. That is not the work. The work is what happens within you.

1 year ago 10488 3364 104 268

A tweet by Sarah Longwell (@SarahLongwell25 ) reads: "He’s threatening media companies who are critical of him. He’s talking about sending Americans to foreign prisons. He’s signing executive orders to investigate former staff members who spoke out against him. Don’t you see what’s happening here?"

I see it. I have lived it. 83 years ago, the U.S. government turned upon a group of its own citizens and residents and sent them to internment camps without due process. I was there among them. American fascism is back. It is here. It is now.

1 year ago 45257 14422 953 481

Community for Rigor Reliable research can be complicated to create. So we made a network of essential resources to help you better understand the principles and practices of scientific rigor.Why trust us? Because we’re a...

So I am leading this group building great teaching materials for scientific rigor (c4r.io). Their first unit is really coming together and I will teach it (Monday, April 21, 2025, 12:00 -1:00pm EST) to see how well it works. Join us: forms.monday.com/forms/7d978e...

1 year ago 20 9 0 0

I've really enjoyed reading this "workography" by Kees van Deemter, whom I've never met but who has had a long career in NLP. Lots of storytelling and reflections on research, moving between institutions and countries, finding mentors, choosing between academia and industry, and more.

1 year ago 19 3 0 0

Calibrating Expressions of Certainty ArXiv link for Calibrating Expressions of Certainty

This study introduces a method for calibrating certainty expressions, transforming phrases like "Maybe" into probability distributions. This enhances decision-making for radiologists and fine-tunes AI models, improving uncertainty communication. https://arxiv.org/abs/2410.04315

1 year ago 3 1 0 1

A Gentle Introduction to Machine Learning Theory - Data Processing Club Share This PostWhen I first began learning about machine learning, I struggled with the notion that lowering the loss on the training data does not necessarily guarantee performance on the test data, ...

A Gentle Introduction to Machine Learning Theory! by Ryoma Sato

data-processing.club/theory/

1 year ago 37 7 0 0

How to Leverage Predictive Uncertainty Estimates for Reducing Catastrophic Forgetting in Online C...

Giuseppe Serra, Ben Werner, Florian Buettner

Action editor: Emmanuel Bengio

https://openreview.net/forum?id=dczXe0S1oL

#forgetting #memory #forget

1 year ago 3 1 0 0

March 31st is Trans Day of Visibility.

1 year ago 714 221 7 7

Enjoying this game very much!

1 year ago 1 0 0 0

despite popularised beliefs, LLMs are not fit for medical applications. SoTA models produce "non-trivial levels of hallucinations" even w inference techniques like CoT & search augmented generation: arxiv.org/pdf/2503.05777

of surveyed clinicians, 53% use LLMs daily & 91% encountered hallucinations

1 year ago 303 120 13 21

While reading Ben Recht's article, I found Foster & Hart (2021) (arxiv.org/abs/2210.07169) quite interesting. The contribution is a proposal of always-calibrated forecaster based on a continuously-relaxed calibration measure. But I actually love their §1.1 motivating calibration.

1 year ago 5 1 0 0

LinkedIn This link will take you to a page that’s not on LinkedIn

📣 New paper! The field of AI research is increasingly realising that benchmarks are very limited in what they can tell us about AI system performance and safety. We argue and lay out a roadmap toward a *science of AI evaluation*: arxiv.org/abs/2503.05336 🧵

1 year ago 38 12 1 1

Medical Hallucinations in Foundation Models and Their Impact on Healthcare Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine. However, a key limitation of their reliability is hallucination, where inaccura...

91% of medical professionals using LLMs have encountered hallucinations and 84% believe they could impact patient health arxiv.org/abs/2503.05777

1 year ago 38 9 1 2

Germany Tried to Silence Me, a UN Official, for Talking About Israel’s Genocidal War in Gaza Francesca Albanese on her five-day trip that exposed Germany's harsh deviation from democratic values and shrinking landscape for freedom of expression.

"Germany Tried to Silence Me, a UN Official, for Talking About Israel’s Genocidal War in Gaza"

In an exclusive piece for Zeteo, UN Special Rapporteur Francesca Albanese writes about her 5-day trip that exposed Germany's harsh deviation from democratic values:

1 year ago 1445 420 36 15

Docs is an open source collaborative text editor created by a joint effort from the French 🇫🇷 and German 🇩🇪governments.
github.com/suitenumeriq...

1 year ago 26 1 0 0

Posts by Dr. Matias Valdenegro