A Gaussian mechanism with ε = 6 can be less private than one with ε = 8. This points to a problem with how we report privacy guarantees in machine learning. A thread 🧵
Posts by Bogdan Kulynych
Here's a paper with all the details:
arxiv.org/abs/2503.10945
Presenting on Mar 24 at #SatML
w/ Felipe Gomez, Borja Balle, Jamie Hayes, @fcalmon.bsky.social, @ahonkela.bsky.social
We've built a new Python package gdpnum to compute non-asymptotic GDP guarantees and estimate their precision for many practical algorithms:
github.com/interpretabl...
But when it fits, it solves both problems: correct comparison via μ, and easy and precise mapping to full privacy profiles and attack success rates. We should use it to report guarantees in DP-SGD or other mechanisms that are Gaussian-like!
GDP isn't always the right choice. Some algorithms can't be precisely represented by it (e.g., Randomized Response) or we do not yet know how to (e.g., Report-Noisy-Max).
How precise? We can measure it with a metric called "regret." For the DP-SGD instance above, GDP has regret of 0.1%, and a single (ε, δ) pair has regret of 21%. Rule of thumb: with noise σ ≥ 2 and iterations T ≥ 400, GDP is very accurate for DP-SGD.
arxiv.org/abs/2406.08918
Here's an example for a specific instantiation of DP-SGD in terms of f-DP trade-off curves (an equivalent operational version of privacy profiles). As we see, a non-asymptotic GDP trade-off curve fits the DP-SGD trade-off curve almost exactly.
Many ML algorithms, especially those involving many compositions like DP-SGD, can be very precisely characterized with GDP. This is a *non-asymptotic* result, not just a central limit approximation!
GDP characterizes the entire privacy profile ε(δ) of a Gaussian mechanism exactly using a single number μ. Interpretation: if a mechanism satisfies μ-GDP, then running membership inference against it is as hard as distinguishing N(0,1) from N(μ,1) based on a single observation.
Can we do better without reporting an entire privacy profile? Yes! With Gaussian differential privacy (GDP).
As the convention sets δ in a data-dependent way, this matters whenever you compare models across datasets or papers.
Issue 2: You can't properly compare two mechanisms by ε if their δ values differ. A Gaussian mechanism with ε = 6 at δ = 10⁻⁵ is less private than one with ε = 8 at δ = 10⁻⁹. This is because you cannot properly compare ε if δ is different.
No attacker in the universe can achieve that 98% rate: It's purely an artifact of compressing the entire privacy profile into one pair (ε, δ). My colleagues and I detailed on this problem in detail in this NeurIPS'24 paper:
arxiv.org/abs/2407.02191
Issue 1: A single (ε, δ) pair can massively overstate privacy risk. Example: DP-SGD with ε = 8 at δ = 10⁻⁵ suggests a worst-case membership inference accuracy of ~98% using standard conversions. But using the full privacy profile, the actual maximum is only ~68%.
The standard way is to report is to use a single (ε, δ) pair for a small δ. The community has developed informal conventions, e.g., ε < 10 is generally considered OK in privacy-preserving machine learning. But this convention has two big issues.
A Gaussian mechanism with ε = 6 can be less private than one with ε = 8. This points to a problem with how we report privacy guarantees in machine learning. A thread 🧵
PSA:
My doomer-worry about AI is not that the LLMs become omnipotent and take over the world but that the wealthy and powerful use it as a means to consolidate power and marginalize or lay off skilled workers and also everything about our technological and political and social life gets worse
I'll be @neuripsconf.bsky.social presenting Strategic Hypothesis Testing (spotlight!)
tldr: Many high-stakes decisions (e.g., drug approval) rely on p-values, but people submitting evidence respond strategically even w/o p-hacking. Can we characterize this behavior & how policy shapes it?
1/n
Nature CAREER COLUMN 11 November 2025 I have Einstein, Bohr and Feynman in my pocket Grappling with difficulties in your career? Try asking an Al-powered advisory panel of experts, suggests Carsten Lund Pedersen.
The make-up of my advisory board often changes, and has included an eclectic mix. Besides Bohr, Feynman and Einstein, I've also tapped microbiologist Alexander Fleming, poet Piet Hein and anti-apartheid activist Nelson Mandela. Sometimes I include experts from my specific disciplines of Al and marketing; other times, l 'invite' artist Pablo Picasso or architect Bjarke Ingels for a completely different perspective. But whatever the board's composition, I typically retain a core group of three seminal scientists.
What do you do after you’re done jumping the shark?
Whatever it is, Nature Careers is all in.
Scientists and scholars in AI and its social impacts call on von der Leyen to retract #AIHype statement.
@olivia.science
@abeba.bsky.social
@irisvanrooij.bsky.social
@alexhanna.bsky.social
@rocher.lc
@danmcquillan.bsky.social
@robin.berjon.com
& many others have signed
www.iccl.ie/press-releas...
Russia’s success in poisoning LLMs with lies, and the effects it has on both AI and politics, reflects Russia’s much deeper understanding of how societies operates than much of Silicon Valley - and how important social sciences are in understanding and waging information warfare
arXiv will no longer accept review articles and position papers unless they have been accepted at a journal or a conference and complete successful peer review.
This is due to being overwhelmed by a hundreds of AI generated papers a month.
Yet another open submission process killed by LLMs.
Pretends to be shocked
www.bbc.co.uk/mediacentre/...
The viral "Definition of AGI" paper tells you to read fake references which do not exist!
Proof: different articles present at the specified journal/volume/page number, and their titles exist nowhere on any searchable repository.
Take this as a warning to not use LMs to generate your references!
What is good pseudoscience?
imho — anyone who equates a human with an app or a machine today is just dehumanizing people and stripping people of their (dwindling, already eroding, not well respected) rights.
Keynote at #COLM2025: Nicholas Carlini from Anthropic
"Are language models worth it?"
Explains that the prior decade of his work on adversarial images, while it taught us a lot, isn't very applied; it's unlikely anyone is actually altering images of cats in scary ways.
I said a thing :).