Advertisement · 728 × 90

Posts by Differential Privacy Papers

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

Jianming Tong, Hanshen Xiao, Krishna Kumar Nair, Hao Kang, Ashish Sirasao, Ziqi Zhang, G. Edward Suh, Tushar Krishna

http://arxiv.org/abs/2604.17476

Multi-user virtual reality enables immersive interaction. However, rendering avatars for numerous participants on each headset incurs prohibitive computational overhead, limiting scalability. We introduce a framework, Privatar, to offload avatar reconstruction from headset to untrusted devices within the same local network while safeguarding attacks against adversaries capable of intercepting offloaded data. Privatar's key insight is that domain-specific knowledge of avatar reconstruction enables provably private offloading at minimal cost. (1) System level. We observe avatar reconstruction is frequency-domain decomposable via BDCT with negligible quality drop, and propose Horizontal Partitioning (HP) to keep high-energy frequency components on-device and offloads only low-energy components. HP offloads local computation while reducing information leakage to low-energy subsets only. (2) Privacy level. For individually offloaded, multi-dimensional signals without aggregation, worst-case local Differential Privacy requires prohibitive noise, ruining utility. We observe users' expression statistical distribution are slowly changing over time and trackable online, and hence propose Distribution-Aware Minimal Perturbation. DAMP minimizes noise based on each user's expression distribution to significantly reduce its effects on utility, retaining formal privacy guarantee. Combined, HP provides empirical privacy against expression identification attacks. DAMP further augments it to offer a formal guarantee against arbitrary adversaries. On a Meta Quest Pro, Privatar supports 2.37x more concurrent users at 6.5% higher reconstruction loss and 9% energy overhead, providing a better throughout-loss Pareto frontier over quantization, sparsity and local cons

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading Jianming Tong, Hanshen Xiao, Krishna Kumar Nair, Hao Kang, Ashish Sirasao, Ziqi Zhang, G. Edward Suh, Tushar Krishna http://arxiv.org/abs/2604.17476 Multi-user virtual reality enables immersive interaction. However, rendering avatars for numerous participants on each headset incurs prohibitive computational overhead, limiting scalability. We introduce a framework, Privatar, to offload avatar reconstruction from headset to untrusted devices within the same local network while safeguarding attacks against adversaries capable of intercepting offloaded data. Privatar's key insight is that domain-specific knowledge of avatar reconstruction enables provably private offloading at minimal cost. (1) System level. We observe avatar reconstruction is frequency-domain decomposable via BDCT with negligible quality drop, and propose Horizontal Partitioning (HP) to keep high-energy frequency components on-device and offloads only low-energy components. HP offloads local computation while reducing information leakage to low-energy subsets only. (2) Privacy level. For individually offloaded, multi-dimensional signals without aggregation, worst-case local Differential Privacy requires prohibitive noise, ruining utility. We observe users' expression statistical distribution are slowly changing over time and trackable online, and hence propose Distribution-Aware Minimal Perturbation. DAMP minimizes noise based on each user's expression distribution to significantly reduce its effects on utility, retaining formal privacy guarantee. Combined, HP provides empirical privacy against expression identification attacks. DAMP further augments it to offer a formal guarantee against arbitrary adversaries. On a Meta Quest Pro, Privatar supports 2.37x more concurrent users at 6.5% higher reconstruction loss and 9% energy overhead, providing a better throughout-loss Pareto frontier over quantization, sparsity and local cons

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

Jianming Tong, Hanshen Xiao, Krishna Kumar Nair, Hao Kang, Ashish Sirasao, Ziqi Zhang, G. Edward Suh, Tushar Krishna

http://arxiv.org/abs/2604.17476

21 hours ago 0 0 0 0
Tight Auditing of Differential Privacy in MST and AIM

Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Bogdan Kulynych

http://arxiv.org/abs/2604.18352

State-of-the-art Differentially Private (DP) synthetic data generators such as MST and AIM are widely used, yet tightly auditing their privacy guarantees remains challenging. We introduce a Gaussian Differential Privacy (GDP)-based auditing framework that measures privacy via the full false-positive/false-negative tradeoff. Applied to MST and AIM under worst-case settings, our method provides the first tight audits in the strong-privacy regime. For $(ε,δ)=(1,10^{-2})$, we obtain $μ_{emp}\approx0.43$ vs. implied $μ=0.45$, showing a small theory-practice gap.
  Our code is publicly available: https://github.com/sassoftware/dpmm.

Tight Auditing of Differential Privacy in MST and AIM Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Bogdan Kulynych http://arxiv.org/abs/2604.18352 State-of-the-art Differentially Private (DP) synthetic data generators such as MST and AIM are widely used, yet tightly auditing their privacy guarantees remains challenging. We introduce a Gaussian Differential Privacy (GDP)-based auditing framework that measures privacy via the full false-positive/false-negative tradeoff. Applied to MST and AIM under worst-case settings, our method provides the first tight audits in the strong-privacy regime. For $(ε,δ)=(1,10^{-2})$, we obtain $μ_{emp}\approx0.43$ vs. implied $μ=0.45$, showing a small theory-practice gap. Our code is publicly available: https://github.com/sassoftware/dpmm.

Tight Auditing of Differential Privacy in MST and AIM

Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Bogdan Kulynych

http://arxiv.org/abs/2604.18352

21 hours ago 2 1 0 0
Evaluating LLM Simulators as Differentially Private Data Generators

Nassima M. Bouzid, Dehao Yuan, Nam H. Nguyen, Mayana Pereira

http://arxiv.org/abs/2604.15461

LLM-based simulators offer a promising path for generating complex synthetic data where traditional differentially private (DP) methods struggle with high-dimensional user profiles. But can LLMs faithfully reproduce statistical distributions from DP-protected inputs? We evaluate this using PersonaLedger, an agentic financial simulator, seeded with DP synthetic personas derived from real user statistics. We find that PersonaLedger achieves promising fraud detection utility (AUC 0.70 at epsilon=1) but exhibits significant distribution drift due to systematic LLM biases--learned priors overriding input statistics for temporal and demographic features. These failure modes must be addressed before LLM-based methods can handle the richer user representations where they might otherwise excel.

Evaluating LLM Simulators as Differentially Private Data Generators Nassima M. Bouzid, Dehao Yuan, Nam H. Nguyen, Mayana Pereira http://arxiv.org/abs/2604.15461 LLM-based simulators offer a promising path for generating complex synthetic data where traditional differentially private (DP) methods struggle with high-dimensional user profiles. But can LLMs faithfully reproduce statistical distributions from DP-protected inputs? We evaluate this using PersonaLedger, an agentic financial simulator, seeded with DP synthetic personas derived from real user statistics. We find that PersonaLedger achieves promising fraud detection utility (AUC 0.70 at epsilon=1) but exhibits significant distribution drift due to systematic LLM biases--learned priors overriding input statistics for temporal and demographic features. These failure modes must be addressed before LLM-based methods can handle the richer user representations where they might otherwise excel.

Evaluating LLM Simulators as Differentially Private Data Generators

Nassima M. Bouzid, Dehao Yuan, Nam H. Nguyen, Mayana Pereira

http://arxiv.org/abs/2604.15461

1 day ago 0 0 0 0
Privacy, Prediction, and Allocation

Ben Jacobsen, Nitin Kohli

http://arxiv.org/abs/2604.15596

Algorithmic predictions are increasingly used to inform the allocation of scarce resources. The promise of these methods is that, through machine learning, they can better identify the people who would benefit most from interventions. Recently, however, several works have called this assumption into question by demonstrating the existence of settings where simple, unit-level allocation strategies can meet or even exceed the performance of those based on individual-level targeting. Separately, other works have objected to individual-level targeting on privacy grounds, leading to an unusual situation where a single solution, unit-level targeting, is recommended for reasons of both privacy and utility. Motivated by the desire to fully understand the interplay of privacy and targeting levels, we initiate the study of aid allocation systems that satisfy differential privacy, synthesizing existing works on private optimization with the economic models of aid allocation used in the non-private literature. To this end, we investigate private variants of both individual and unit-level allocation strategies in both stochastic and distribution-free settings under a range of constraints on data availability. Through this analysis, we provide clean, interpretable bounds characterizing the tradeoffs between privacy, efficiency, and targeting precision in allocation.

Privacy, Prediction, and Allocation Ben Jacobsen, Nitin Kohli http://arxiv.org/abs/2604.15596 Algorithmic predictions are increasingly used to inform the allocation of scarce resources. The promise of these methods is that, through machine learning, they can better identify the people who would benefit most from interventions. Recently, however, several works have called this assumption into question by demonstrating the existence of settings where simple, unit-level allocation strategies can meet or even exceed the performance of those based on individual-level targeting. Separately, other works have objected to individual-level targeting on privacy grounds, leading to an unusual situation where a single solution, unit-level targeting, is recommended for reasons of both privacy and utility. Motivated by the desire to fully understand the interplay of privacy and targeting levels, we initiate the study of aid allocation systems that satisfy differential privacy, synthesizing existing works on private optimization with the economic models of aid allocation used in the non-private literature. To this end, we investigate private variants of both individual and unit-level allocation strategies in both stochastic and distribution-free settings under a range of constraints on data availability. Through this analysis, we provide clean, interpretable bounds characterizing the tradeoffs between privacy, efficiency, and targeting precision in allocation.

Privacy, Prediction, and Allocation

Ben Jacobsen, Nitin Kohli

http://arxiv.org/abs/2604.15596

1 day ago 0 0 0 0
DPDSyn: Improving Differentially Private Dataset Synthesis for Model Training by Downstream Task Guidance

Mingxuan Jia, Wen Huang, Weixin Zhao, Xingyi Wang, Jian Peng, Zhishuo Zhang

http://arxiv.org/abs/2604.15660

How to synthesize a dataset while achieving differential privacy for AI model training is a meaningful but challenging problem. To address this problem, state-of-the-art methods first select original private dataset's multiple low-dimensional distributions that have the potential to approximate the distribution of original private dataset with high precision, and then synthesize a dataset obeying all selected low-dimensional distributions as the synthetic dataset. However, it is difficult to select suitable low-dimensional distributions, which in turn degrades the data utility of resulting synthetic dataset. To improve differentially private dataset synthesis, we propose to train a differentially private AI model for downstream tasks on the original private dataset and utilize the trained model to synthesize datasets. In particular, on the one hand, the AI model satisfies differential privacy so no matter how to use the model does not disclose private information of original private dataset. On the other hand, the AI model is trained to complete the downstream task so the AI model preserves critical information for completing downstream tasks. We utilize the AI model to synthesize datasets to achieve the goal of improving data utility while preserving privacy. Empirical evaluations on four benchmark datasets demonstrate that our proposed DPDSyn consistently outperforms eight state-of-the-art baselines with a maximum improvement of 2.40x in accuracy and 333.73x in synthesis efficiency. Further experiments also validate that DPDSyn has strong scalability across varying data scales.

DPDSyn: Improving Differentially Private Dataset Synthesis for Model Training by Downstream Task Guidance Mingxuan Jia, Wen Huang, Weixin Zhao, Xingyi Wang, Jian Peng, Zhishuo Zhang http://arxiv.org/abs/2604.15660 How to synthesize a dataset while achieving differential privacy for AI model training is a meaningful but challenging problem. To address this problem, state-of-the-art methods first select original private dataset's multiple low-dimensional distributions that have the potential to approximate the distribution of original private dataset with high precision, and then synthesize a dataset obeying all selected low-dimensional distributions as the synthetic dataset. However, it is difficult to select suitable low-dimensional distributions, which in turn degrades the data utility of resulting synthetic dataset. To improve differentially private dataset synthesis, we propose to train a differentially private AI model for downstream tasks on the original private dataset and utilize the trained model to synthesize datasets. In particular, on the one hand, the AI model satisfies differential privacy so no matter how to use the model does not disclose private information of original private dataset. On the other hand, the AI model is trained to complete the downstream task so the AI model preserves critical information for completing downstream tasks. We utilize the AI model to synthesize datasets to achieve the goal of improving data utility while preserving privacy. Empirical evaluations on four benchmark datasets demonstrate that our proposed DPDSyn consistently outperforms eight state-of-the-art baselines with a maximum improvement of 2.40x in accuracy and 333.73x in synthesis efficiency. Further experiments also validate that DPDSyn has strong scalability across varying data scales.

DPDSyn: Improving Differentially Private Dataset Synthesis for Model Training by Downstream Task Guidance

Mingxuan Jia, Wen Huang, Weixin Zhao, Xingyi Wang, Jian Peng, Zhishuo Zhang

http://arxiv.org/abs/2604.15660

1 day ago 1 0 0 0
DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar, Kamalika Chaudhuri, Yu-Xiang Wang, Ruihan Wu

http://arxiv.org/abs/2604.15851

Differential privacy (DP) has a wide range of applications for protecting data privacy, but designing and verifying DP algorithms requires expert-level reasoning, creating a high barrier for non-expert practitioners. Prior works either rely on specialized verification languages that demand substantial domain expertise or remain semi-automated and require human-in-the-loop guidance. In this work, we investigate whether large language models (LLMs) can automate DP reasoning. We introduce DPrivBench, a benchmark in which each instance asks whether a function or algorithm satisfies a stated DP guarantee under specified assumptions. The benchmark is carefully designed to cover a broad range of DP topics, span diverse difficulty levels, and resist shortcut reasoning through trivial pattern matching. Experiments show that while the strongest models handle textbook mechanisms well, all models struggle with advanced algorithms, revealing substantial gaps in current DP reasoning capabilities. Through further analytic study and failure-mode analysis, we identify several promising directions for improving automated DP reasoning. Our benchmark provides a solid foundation for developing and evaluating such methods, and complements existing benchmarks for mathematical reasoning.

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar, Kamalika Chaudhuri, Yu-Xiang Wang, Ruihan Wu http://arxiv.org/abs/2604.15851 Differential privacy (DP) has a wide range of applications for protecting data privacy, but designing and verifying DP algorithms requires expert-level reasoning, creating a high barrier for non-expert practitioners. Prior works either rely on specialized verification languages that demand substantial domain expertise or remain semi-automated and require human-in-the-loop guidance. In this work, we investigate whether large language models (LLMs) can automate DP reasoning. We introduce DPrivBench, a benchmark in which each instance asks whether a function or algorithm satisfies a stated DP guarantee under specified assumptions. The benchmark is carefully designed to cover a broad range of DP topics, span diverse difficulty levels, and resist shortcut reasoning through trivial pattern matching. Experiments show that while the strongest models handle textbook mechanisms well, all models struggle with advanced algorithms, revealing substantial gaps in current DP reasoning capabilities. Through further analytic study and failure-mode analysis, we identify several promising directions for improving automated DP reasoning. Our benchmark provides a solid foundation for developing and evaluating such methods, and complements existing benchmarks for mathematical reasoning.

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar, Kamalika Chaudhuri, Yu-Xiang Wang, Ruihan Wu

http://arxiv.org/abs/2604.15851

1 day ago 0 0 0 0
Decoupling Identity from Utility: Privacy-by-Design Frameworks for Financial Ecosystems

Ifayoyinsola Ibikunle, Tyler Farnan, Senthil Kumar, Mayana Pereira

http://arxiv.org/abs/2604.14495

Financial institutions face tension between maximizing data utility and mitigating the re-identification risks inherent in traditional anonymization methods. This paper explores Differentially Private (DP) synthetic data as a robust "Privacy by Design" framework to resolve this conflict, ensuring output privacy while satisfying stringent regulatory obligations. We examine two distinct generative paradigms: Direct Tabular Synthesis, which reconstructs high-fidelity joint distributions from raw data, and DP-Seeded Agent-Based Modeling (ABM), which uses DP-protected aggregates to parameterize complex, stateful simulations. While tabular synthesis excels at reflecting static historical correlations for QA testing and business analytics, the DP-Seeded ABM offers a forward-looking "counterfactual laboratory" capable of modeling dynamic market behaviors and black swan events. By decoupling individual identities from data utility, these methodologies eliminate traditional data-clearing bottlenecks, enabling seamless cross-institutional research and compliant decision-making in an evolving regulatory landscape.

Decoupling Identity from Utility: Privacy-by-Design Frameworks for Financial Ecosystems Ifayoyinsola Ibikunle, Tyler Farnan, Senthil Kumar, Mayana Pereira http://arxiv.org/abs/2604.14495 Financial institutions face tension between maximizing data utility and mitigating the re-identification risks inherent in traditional anonymization methods. This paper explores Differentially Private (DP) synthetic data as a robust "Privacy by Design" framework to resolve this conflict, ensuring output privacy while satisfying stringent regulatory obligations. We examine two distinct generative paradigms: Direct Tabular Synthesis, which reconstructs high-fidelity joint distributions from raw data, and DP-Seeded Agent-Based Modeling (ABM), which uses DP-protected aggregates to parameterize complex, stateful simulations. While tabular synthesis excels at reflecting static historical correlations for QA testing and business analytics, the DP-Seeded ABM offers a forward-looking "counterfactual laboratory" capable of modeling dynamic market behaviors and black swan events. By decoupling individual identities from data utility, these methodologies eliminate traditional data-clearing bottlenecks, enabling seamless cross-institutional research and compliant decision-making in an evolving regulatory landscape.

Decoupling Identity from Utility: Privacy-by-Design Frameworks for Financial Ecosystems

Ifayoyinsola Ibikunle, Tyler Farnan, Senthil Kumar, Mayana Pereira

http://arxiv.org/abs/2604.14495

4 days ago 0 0 0 0
Differentially Private Conformal Prediction

Jiamei Wu, Ce Zhang, Zhipeng Cai, Jingsen Kong, Bei Jiang, Linglong Kong, Lingchen Kong

http://arxiv.org/abs/2604.14621

Conformal prediction (CP) has attracted broad attention as a simple and flexible framework for uncertainty quantification through prediction sets. In this work, we study how to deploy CP under differential privacy (DP) in a statistically efficient manner. We first introduce differential CP, a non-splitting conformal procedure that avoids the efficiency loss caused by data splitting and serves as a bridge between oracle CP and private conformal inference. By exploiting the stability properties of DP mechanisms, differential CP establishes a direct connection to oracle CP and inherits corresponding validity behavior. Building on this idea, we develop Differentially Private Conformal Prediction (DPCP), a fully private procedure that combines DP model training with a private quantile mechanism for calibration. We establish the end-to-end privacy guarantee of DPCP and investigate its coverage properties under additional regularity conditions. We further study the efficiency of both differential CP and DPCP under empirical risk minimization and general regression models, showing that DPCP can produce tighter prediction sets than existing private split conformal approaches under the same privacy budget. Numerical experiments on synthetic and real datasets demonstrate the practical effectiveness of the proposed methods.

Differentially Private Conformal Prediction Jiamei Wu, Ce Zhang, Zhipeng Cai, Jingsen Kong, Bei Jiang, Linglong Kong, Lingchen Kong http://arxiv.org/abs/2604.14621 Conformal prediction (CP) has attracted broad attention as a simple and flexible framework for uncertainty quantification through prediction sets. In this work, we study how to deploy CP under differential privacy (DP) in a statistically efficient manner. We first introduce differential CP, a non-splitting conformal procedure that avoids the efficiency loss caused by data splitting and serves as a bridge between oracle CP and private conformal inference. By exploiting the stability properties of DP mechanisms, differential CP establishes a direct connection to oracle CP and inherits corresponding validity behavior. Building on this idea, we develop Differentially Private Conformal Prediction (DPCP), a fully private procedure that combines DP model training with a private quantile mechanism for calibration. We establish the end-to-end privacy guarantee of DPCP and investigate its coverage properties under additional regularity conditions. We further study the efficiency of both differential CP and DPCP under empirical risk minimization and general regression models, showing that DPCP can produce tighter prediction sets than existing private split conformal approaches under the same privacy budget. Numerical experiments on synthetic and real datasets demonstrate the practical effectiveness of the proposed methods.

Differentially Private Conformal Prediction

Jiamei Wu, Ce Zhang, Zhipeng Cai, Jingsen Kong, Bei Jiang, Linglong Kong, Lingchen Kong

http://arxiv.org/abs/2604.14621

4 days ago 0 0 0 0
Sequential Change Detection for Multiple Data Streams with Differential Privacy

Lixing Zhang, Liyan Xie, Ruizhi Zhang

http://arxiv.org/abs/2604.13274

Sequential change-point detection seeks to rapidly identify distributional changes in streaming data while controlling false alarms. Existing multi-stream detection methods typically rely on non-private access to raw observations or intermediate statistics, limiting their usage in privacy-sensitive settings. We study sequential change-point detection for multiple data streams under differential privacy constraints. We consider multiple independent streams undergoing a synchronized change at an unknown time and in an unknown subset of streams, and propose DP-SUM-CUSUM, a differentially private detection procedure based on the summation of per-stream CUSUM statistics with calibrated Laplace noise injection. We show that DP-SUM-CUSUM satisfies sequential $\varepsilon$-differential privacy and derive bounds on the average run length to false alarm and the worst-case average detection delay, explicitly characterizing the privacy--efficiency tradeoff. A truncation-based extension is also presented to handle distributional shifts with unbounded log-likelihood ratios. Simulations and experiments on an Internet of Things (IoT) botnet dataset validate the proposed approach.

Sequential Change Detection for Multiple Data Streams with Differential Privacy Lixing Zhang, Liyan Xie, Ruizhi Zhang http://arxiv.org/abs/2604.13274 Sequential change-point detection seeks to rapidly identify distributional changes in streaming data while controlling false alarms. Existing multi-stream detection methods typically rely on non-private access to raw observations or intermediate statistics, limiting their usage in privacy-sensitive settings. We study sequential change-point detection for multiple data streams under differential privacy constraints. We consider multiple independent streams undergoing a synchronized change at an unknown time and in an unknown subset of streams, and propose DP-SUM-CUSUM, a differentially private detection procedure based on the summation of per-stream CUSUM statistics with calibrated Laplace noise injection. We show that DP-SUM-CUSUM satisfies sequential $\varepsilon$-differential privacy and derive bounds on the average run length to false alarm and the worst-case average detection delay, explicitly characterizing the privacy--efficiency tradeoff. A truncation-based extension is also presented to handle distributional shifts with unbounded log-likelihood ratios. Simulations and experiments on an Internet of Things (IoT) botnet dataset validate the proposed approach.

Sequential Change Detection for Multiple Data Streams with Differential Privacy

Lixing Zhang, Liyan Xie, Ruizhi Zhang

http://arxiv.org/abs/2604.13274

5 days ago 0 0 0 0
Cross-Domain Query Translation for Network Troubleshooting: A Multi-Agent LLM Framework with Privacy Preservation and Self-Reflection

Nguyen Phuc Tran, Brigitte Jaumard, Karthikeyan Premkumar, Salman Memon

http://arxiv.org/abs/2604.13353

This paper presents a hierarchical multi-agent LLM architecture to bridge communication gaps between non-technical end users and telecommunications domain experts in private network environments. We propose a cross-domain query translation framework that leverages specialized language models coordinated through multi-agent reflection-based reasoning. The resulting system addresses three critical challenges: (1) accurately classify user queries related to telecommunications network issues using a dual-stage hierarchical approach, (2) preserve user privacy through the anonymization of semantically relevant personally identifiable information (PII) while maintaining diagnostic utility, and (3) translate technical expert responses into user-comprehensible language.
  Our approach employs ReAct-style agents enhanced with self-reflection mechanisms for iterative output refinement, semantic-preserving anonymization techniques respecting $k$-anonymity and differential privacy principles, and few-shot learning strategies designed for limited training data scenarios. The framework was comprehensively evaluated on 10,000 previously unseen validation scenarios across various vertical industries.

Cross-Domain Query Translation for Network Troubleshooting: A Multi-Agent LLM Framework with Privacy Preservation and Self-Reflection Nguyen Phuc Tran, Brigitte Jaumard, Karthikeyan Premkumar, Salman Memon http://arxiv.org/abs/2604.13353 This paper presents a hierarchical multi-agent LLM architecture to bridge communication gaps between non-technical end users and telecommunications domain experts in private network environments. We propose a cross-domain query translation framework that leverages specialized language models coordinated through multi-agent reflection-based reasoning. The resulting system addresses three critical challenges: (1) accurately classify user queries related to telecommunications network issues using a dual-stage hierarchical approach, (2) preserve user privacy through the anonymization of semantically relevant personally identifiable information (PII) while maintaining diagnostic utility, and (3) translate technical expert responses into user-comprehensible language. Our approach employs ReAct-style agents enhanced with self-reflection mechanisms for iterative output refinement, semantic-preserving anonymization techniques respecting $k$-anonymity and differential privacy principles, and few-shot learning strategies designed for limited training data scenarios. The framework was comprehensively evaluated on 10,000 previously unseen validation scenarios across various vertical industries.

Cross-Domain Query Translation for Network Troubleshooting: A Multi-Agent LLM Framework with Privacy Preservation and Self-Reflection

Nguyen Phuc Tran, Brigitte Jaumard, Karthikeyan Premkumar, Salman Memon

http://arxiv.org/abs/2604.13353

5 days ago 0 0 0 0
Advertisement
Refined Differentially Private Linear Regression via Extension of a Free Lunch Result

Sasmita Harini S, Anshoo Tandon

http://arxiv.org/abs/2604.11820

As data-privacy regulations tighten and statistical models are increasingly deployed on sensitive human-sourced data, privacy-preserving linear regression has become a critical necessity. For the add-remove DP model, Kulesza et al. (2024) and Fitzsimons et al. (2024) have independently shown that the size of the dataset -- an important statistic for linear regression -- can be privately estimated for "free", via a simplex transformation of bounded variables and private sum queries on the transformed variables. In this work, we extend this free lunch result via carefully crafted multidimensional simplex transformations to variables and functions that are bounded in the interval [0,1]. We show that these transformations can be applied to refine the estimates of sufficient statistics needed for private simple linear regression based on ordinary least squares. We provide both analytical and numerical results to demonstrate the superiority of our approach. Our proposed transformations have general applicability and can be readily adapted for differentially private polynomial regression.

Refined Differentially Private Linear Regression via Extension of a Free Lunch Result Sasmita Harini S, Anshoo Tandon http://arxiv.org/abs/2604.11820 As data-privacy regulations tighten and statistical models are increasingly deployed on sensitive human-sourced data, privacy-preserving linear regression has become a critical necessity. For the add-remove DP model, Kulesza et al. (2024) and Fitzsimons et al. (2024) have independently shown that the size of the dataset -- an important statistic for linear regression -- can be privately estimated for "free", via a simplex transformation of bounded variables and private sum queries on the transformed variables. In this work, we extend this free lunch result via carefully crafted multidimensional simplex transformations to variables and functions that are bounded in the interval [0,1]. We show that these transformations can be applied to refine the estimates of sufficient statistics needed for private simple linear regression based on ordinary least squares. We provide both analytical and numerical results to demonstrate the superiority of our approach. Our proposed transformations have general applicability and can be readily adapted for differentially private polynomial regression.

Refined Differentially Private Linear Regression via Extension of a Free Lunch Result

Sasmita Harini S, Anshoo Tandon

http://arxiv.org/abs/2604.11820

6 days ago 0 0 0 0
Modular Verification of Differential Privacy in Probabilistic Higher-Order Separation Logic (Extended Version)

Philipp G. Haselwarter, Alejandro Aguirre, Simon Oddershede Gregersen, Kwing Hei Li, Joseph Tassarotti, Lars Birkedal

http://arxiv.org/abs/2604.12713

Differential privacy is the standard method for privacy-preserving data analysis. The importance of having strong guarantees on the reliability of implementations of differentially private algorithms is widely recognized and has sparked fruitful research on formal methods. However, the design patterns and language features used in modern DP libraries as well as the classes of guarantees that the library designers wish to establish often fall outside of the scope of previous verification approaches.
  We introduce a program logic suitable for verifying differentially private implementations written in complex, general-purpose programming languages. Our logic has first-class support for reasoning about privacy budgets as a separation logic resource. The expressiveness of the logic and the target language allow our approach to handle common programming patterns used in the implementation of libraries for differential privacy, such as privacy filters and caching. While previous work has focused on developing guarantees for programs written in domain-specific languages or for privacy mechanisms in isolation, our logic can reason modularly about primitives, higher-order combinators, and interactive algorithms.
  We demonstrate the applicability of our approach by implementing a verified library of differential privacy mechanisms, including an online version of the Sparse Vector Technique, as well as a privacy filter inspired by the popular Python library OpenDP, which crucially relies on our ability to handle the combination of randomization, local state, and higher-order functions. We demonstrate that our specifications are general and reusable by instantiating them to verify clients of our library. All of our r

Modular Verification of Differential Privacy in Probabilistic Higher-Order Separation Logic (Extended Version) Philipp G. Haselwarter, Alejandro Aguirre, Simon Oddershede Gregersen, Kwing Hei Li, Joseph Tassarotti, Lars Birkedal http://arxiv.org/abs/2604.12713 Differential privacy is the standard method for privacy-preserving data analysis. The importance of having strong guarantees on the reliability of implementations of differentially private algorithms is widely recognized and has sparked fruitful research on formal methods. However, the design patterns and language features used in modern DP libraries as well as the classes of guarantees that the library designers wish to establish often fall outside of the scope of previous verification approaches. We introduce a program logic suitable for verifying differentially private implementations written in complex, general-purpose programming languages. Our logic has first-class support for reasoning about privacy budgets as a separation logic resource. The expressiveness of the logic and the target language allow our approach to handle common programming patterns used in the implementation of libraries for differential privacy, such as privacy filters and caching. While previous work has focused on developing guarantees for programs written in domain-specific languages or for privacy mechanisms in isolation, our logic can reason modularly about primitives, higher-order combinators, and interactive algorithms. We demonstrate the applicability of our approach by implementing a verified library of differential privacy mechanisms, including an online version of the Sparse Vector Technique, as well as a privacy filter inspired by the popular Python library OpenDP, which crucially relies on our ability to handle the combination of randomization, local state, and higher-order functions. We demonstrate that our specifications are general and reusable by instantiating them to verify clients of our library. All of our r

Modular Verification of Differential Privacy in Probabilistic Higher-Order Separation Logic (Extended Version)

Philipp G. Haselwarter, Alejandro Aguirre, Simon Oddershede Gregersen, Kwing Hei Li, Joseph Tassarotti, Lars Birkedal

http://arxiv.org/abs/2604.12713

6 days ago 0 0 0 0
Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge

Gustavo de Carvalho Bertoli

http://arxiv.org/abs/2604.12737

While Federated Learning (FL) mitigates direct data exposure, the resulting trained models remain susceptible to membership inference attacks (MIAs). This paper presents an empirical evaluation of Differential Privacy (DP) as a defense mechanism against MIAs in FL, leveraging the environment of the 2025 NIST Genomics Privacy-Preserving Federated Learning (PPFL) Red Teaming Event. To improve inference accuracy, we propose a stacking attack strategy that ensembles seven black-box estimators to train a meta-classifier on prediction probabilities and cross-entropy losses. We evaluate this methodology against target models under three privacy configurations: an unprotected convolutional neural network (CNN, $ε=\infty$), a low-privacy DP model ($ε=200$), and a high-privacy DP model ($ε=10$). The attack outperforms all baselines in the No DP and Low Privacy settings and, critically, maintains measurable membership leakage at $ε=200$ where a single-signal LiRA baseline collapses. Evaluated on an independent third-party benchmark, these results provide an empirical characterisation of how stacking-based inference degrades across calibrated DP tiers in FL.

Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge Gustavo de Carvalho Bertoli http://arxiv.org/abs/2604.12737 While Federated Learning (FL) mitigates direct data exposure, the resulting trained models remain susceptible to membership inference attacks (MIAs). This paper presents an empirical evaluation of Differential Privacy (DP) as a defense mechanism against MIAs in FL, leveraging the environment of the 2025 NIST Genomics Privacy-Preserving Federated Learning (PPFL) Red Teaming Event. To improve inference accuracy, we propose a stacking attack strategy that ensembles seven black-box estimators to train a meta-classifier on prediction probabilities and cross-entropy losses. We evaluate this methodology against target models under three privacy configurations: an unprotected convolutional neural network (CNN, $ε=\infty$), a low-privacy DP model ($ε=200$), and a high-privacy DP model ($ε=10$). The attack outperforms all baselines in the No DP and Low Privacy settings and, critically, maintains measurable membership leakage at $ε=200$ where a single-signal LiRA baseline collapses. Evaluated on an independent third-party benchmark, these results provide an empirical characterisation of how stacking-based inference degrades across calibrated DP tiers in FL.

Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge

Gustavo de Carvalho Bertoli

http://arxiv.org/abs/2604.12737

6 days ago 0 0 0 0
Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations

Tong Zhang, Jiangning Zhang, Zhucun Xue, Juntao Jiang, Yicheng Xu, Chengming Xu, Teng Hu, Xingyu Xie, Xiaobin Hu, Yabiao Wang, Yong Liu, Shuicheng Yan

http://arxiv.org/abs/2604.12968

Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam, serve as the cornerstone of modern training pipelines. However, large-scale model training, stringent differential privacy requirements, and distributed learning paradigms expose critical limitations in these conventional approaches regarding privacy protection and memory efficiency. To mitigate these bottlenecks, researchers explore second-order optimization techniques to surpass first-order performance ceilings, while zeroth-order methods reemerge to alleviate memory constraints inherent to large-scale training. Despite this proliferation of methodologies, the field lacks a cohesive framework that unifies underlying principles and delineates application scenarios for these disparate approaches. In this work, we retrospectively analyze the evolutionary trajectory of deep learning optimization algorithms and present a comprehensive empirical evaluation of mainstream optimizers across diverse model architectures and training scenarios. We distill key emerging trends and fundamental design trade-offs, pinpointing promising directions for future research. By synthesizing theoretical insights with extensive empirical evidence, we provide actionable guidance for designing next-generation highly efficient, robust, and trustworthy optimization methods. The code is available at https://github.com/APRIL-AIGC/Awesome-Optimizer.

Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations Tong Zhang, Jiangning Zhang, Zhucun Xue, Juntao Jiang, Yicheng Xu, Chengming Xu, Teng Hu, Xingyu Xie, Xiaobin Hu, Yabiao Wang, Yong Liu, Shuicheng Yan http://arxiv.org/abs/2604.12968 Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam, serve as the cornerstone of modern training pipelines. However, large-scale model training, stringent differential privacy requirements, and distributed learning paradigms expose critical limitations in these conventional approaches regarding privacy protection and memory efficiency. To mitigate these bottlenecks, researchers explore second-order optimization techniques to surpass first-order performance ceilings, while zeroth-order methods reemerge to alleviate memory constraints inherent to large-scale training. Despite this proliferation of methodologies, the field lacks a cohesive framework that unifies underlying principles and delineates application scenarios for these disparate approaches. In this work, we retrospectively analyze the evolutionary trajectory of deep learning optimization algorithms and present a comprehensive empirical evaluation of mainstream optimizers across diverse model architectures and training scenarios. We distill key emerging trends and fundamental design trade-offs, pinpointing promising directions for future research. By synthesizing theoretical insights with extensive empirical evidence, we provide actionable guidance for designing next-generation highly efficient, robust, and trustworthy optimization methods. The code is available at https://github.com/APRIL-AIGC/Awesome-Optimizer.

Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations

Tong Zhang, Jiangning Zhang, Zhucun Xue, Juntao Jiang, Yicheng Xu, Chengming Xu, Teng Hu, Xingyu Xie, Xiaobin Hu, Yabiao Wang, Yong Liu, Shuicheng Yan

http://arxiv.org/abs/2604.12968

6 days ago 0 0 0 0
Mask-Free Privacy Extraction and Rewriting: A Domain-Aware Approach via Prototype Learning

Xiaodong Li, Yuhua Wang, Qingchen Yu, Zixuan Qin, Yifan Sun, Qinnan Zhang, Hainan Zhang, Zhiming Zheng

http://arxiv.org/abs/2604.10145

Client-side privacy rewriting is crucial for deploying LLMs in privacy-sensitive domains. However, existing approaches struggle to balance privacy and utility. Full-text methods often distort context, while span-level approaches rely on impractical manual masks or brittle static dictionaries. Attempts to automate localization via prompt-based LLMs prove unreliable, as they suffer from unstable instruction following that leads to privacy leakage and excessive context scrubbing. To address these limitations, we propose DAMPER (Domain-Aware Mask-free Privacy Extraction and Rewriting). DAMPER operationalizes latent privacy semantics into compact Domain Privacy Prototypes via contrastive learning, enabling precise, autonomous span localization. Furthermore, we introduce a Prototype-Guided Preference Alignment, which leverages learned prototypes as semantic anchors to construct preference pairs, optimizing a domain-compliant rewriting policy without human annotations. At inference time, DAMPER integrates a sampling-based Exponential Mechanism to provide rigorous span-level Differential Privacy (DP) guarantees. Extensive experiments demonstrate that DAMPER significantly outperforms existing baselines, achieving a superior privacy-utility trade-off.

Mask-Free Privacy Extraction and Rewriting: A Domain-Aware Approach via Prototype Learning Xiaodong Li, Yuhua Wang, Qingchen Yu, Zixuan Qin, Yifan Sun, Qinnan Zhang, Hainan Zhang, Zhiming Zheng http://arxiv.org/abs/2604.10145 Client-side privacy rewriting is crucial for deploying LLMs in privacy-sensitive domains. However, existing approaches struggle to balance privacy and utility. Full-text methods often distort context, while span-level approaches rely on impractical manual masks or brittle static dictionaries. Attempts to automate localization via prompt-based LLMs prove unreliable, as they suffer from unstable instruction following that leads to privacy leakage and excessive context scrubbing. To address these limitations, we propose DAMPER (Domain-Aware Mask-free Privacy Extraction and Rewriting). DAMPER operationalizes latent privacy semantics into compact Domain Privacy Prototypes via contrastive learning, enabling precise, autonomous span localization. Furthermore, we introduce a Prototype-Guided Preference Alignment, which leverages learned prototypes as semantic anchors to construct preference pairs, optimizing a domain-compliant rewriting policy without human annotations. At inference time, DAMPER integrates a sampling-based Exponential Mechanism to provide rigorous span-level Differential Privacy (DP) guarantees. Extensive experiments demonstrate that DAMPER significantly outperforms existing baselines, achieving a superior privacy-utility trade-off.

Mask-Free Privacy Extraction and Rewriting: A Domain-Aware Approach via Prototype Learning

Xiaodong Li, Yuhua Wang, Qingchen Yu, Zixuan Qin, Yifan Sun, Qinnan Zhang, Hainan Zhang, Zhiming Zheng

http://arxiv.org/abs/2604.10145

1 week ago 0 0 0 0
Replicable Composition

Kiarash Banihashem, MohammadHossein Bateni, Hossein Esfandiari, Samira Goudarzi, MohammadTaghi Hajiaghayi

http://arxiv.org/abs/2604.10423

Replicability requires that algorithmic conclusions remain consistent when rerun on independently drawn data. A central structural question is composition: given $k$ problems each admitting a $ρ$-replicable algorithm with sample complexity $n$, how many samples are needed to solve all jointly while preserving replicability? The naive analysis yields $\widetilde{O}(nk^2)$ samples, and Bun et al. (STOC'23) observed that reductions through differential privacy give an alternative $\widetilde{O}(n^2k)$ bound, leaving open whether the optimal $\widetilde{O}(nk)$ scaling is achievable. We resolve this open problem and, more generally, show that problems with sample complexities $n_1,\ldots,n_k$ can be jointly solved with $\widetilde{O}(\sum_i n_i)$ samples while preserving constant replicability. Our approach converts each replicable algorithm into a perfectly generalizing one, composes them via a privacy-style analysis, and maps back via correlated sampling. This yields the first advanced composition theorem for replicability. En route, we obtain new bounds for the composition of perfectly generalizing algorithms with heterogeneous parameters.
  As part of our results, we provide a boosting theorem for the success probability of replicable algorithms. For a broad class of problems, the failure probability appears as a separate additive term independent of $ρ$, immediately yielding improved sample complexity bounds for several problems.
  Finally, we prove an $Ω(nk^2)$ lower bound for adaptive composition, establishing a quadratic separation from the non-adaptive setting. The key technique, which we call the phantom run, yields structural results of independent interest.

Replicable Composition Kiarash Banihashem, MohammadHossein Bateni, Hossein Esfandiari, Samira Goudarzi, MohammadTaghi Hajiaghayi http://arxiv.org/abs/2604.10423 Replicability requires that algorithmic conclusions remain consistent when rerun on independently drawn data. A central structural question is composition: given $k$ problems each admitting a $ρ$-replicable algorithm with sample complexity $n$, how many samples are needed to solve all jointly while preserving replicability? The naive analysis yields $\widetilde{O}(nk^2)$ samples, and Bun et al. (STOC'23) observed that reductions through differential privacy give an alternative $\widetilde{O}(n^2k)$ bound, leaving open whether the optimal $\widetilde{O}(nk)$ scaling is achievable. We resolve this open problem and, more generally, show that problems with sample complexities $n_1,\ldots,n_k$ can be jointly solved with $\widetilde{O}(\sum_i n_i)$ samples while preserving constant replicability. Our approach converts each replicable algorithm into a perfectly generalizing one, composes them via a privacy-style analysis, and maps back via correlated sampling. This yields the first advanced composition theorem for replicability. En route, we obtain new bounds for the composition of perfectly generalizing algorithms with heterogeneous parameters. As part of our results, we provide a boosting theorem for the success probability of replicable algorithms. For a broad class of problems, the failure probability appears as a separate additive term independent of $ρ$, immediately yielding improved sample complexity bounds for several problems. Finally, we prove an $Ω(nk^2)$ lower bound for adaptive composition, establishing a quadratic separation from the non-adaptive setting. The key technique, which we call the phantom run, yields structural results of independent interest.

Replicable Composition

Kiarash Banihashem, MohammadHossein Bateni, Hossein Esfandiari, Samira Goudarzi, MohammadTaghi Hajiaghayi

http://arxiv.org/abs/2604.10423

1 week ago 0 0 0 0
Tradeoffs in Privacy, Welfare, and Fairness for Facility Location

Sara Fish, Yannai A. Gonczarowski, Jason Z. Tang, Salil Vadhan

http://arxiv.org/abs/2604.10443

The differentially private (DP) facility location problem seeks to determine a socially optimal placement for a public facility while ensuring that each participating agent's location remains private. To privatize its input data, a DP mechanism must inject noise into its output distribution, producing a placement that will have lower expected social welfare than the optimal spot for the facility. The privacy-induced welfare loss can be viewed as the "cost of privacy," illustrating a tradeoff between social welfare and privacy that has been the focus of prior work. Yet, the imposition of privacy also induces a third consideration that has not been similarly studied: fairness in how the "cost of privacy" is distributed across individuals. For instance, a mechanism may satisfy DP with minimal social welfare loss, yet still be undesirable if that loss falls entirely on one individual. In this paper, we quantify this new notion of unfairness and design mechanisms for facility location that attempt to simultaneously optimize across privacy, social welfare, and fairness.
  We first derive an impossibility result, showing that privacy and fairness cannot be simultaneously guaranteed over all possible datasets that could represent the locations of individuals in a population. We then consider a relaxation that still requires worst-case DP, but only seeks fairness and social welfare over smaller, more "realistic-looking" families of datasets. For this relaxation, we construct a DP mechanism and demonstrate that it is simultaneously optimal (or, for a harder family of datasets, near-optimal up to small factors) on fairness and social welfare. This suggests that while there is a tradeoff between privacy and each of social welfare and fairness, there is no additional tradeoff when we consider all three objectives simu

Tradeoffs in Privacy, Welfare, and Fairness for Facility Location Sara Fish, Yannai A. Gonczarowski, Jason Z. Tang, Salil Vadhan http://arxiv.org/abs/2604.10443 The differentially private (DP) facility location problem seeks to determine a socially optimal placement for a public facility while ensuring that each participating agent's location remains private. To privatize its input data, a DP mechanism must inject noise into its output distribution, producing a placement that will have lower expected social welfare than the optimal spot for the facility. The privacy-induced welfare loss can be viewed as the "cost of privacy," illustrating a tradeoff between social welfare and privacy that has been the focus of prior work. Yet, the imposition of privacy also induces a third consideration that has not been similarly studied: fairness in how the "cost of privacy" is distributed across individuals. For instance, a mechanism may satisfy DP with minimal social welfare loss, yet still be undesirable if that loss falls entirely on one individual. In this paper, we quantify this new notion of unfairness and design mechanisms for facility location that attempt to simultaneously optimize across privacy, social welfare, and fairness. We first derive an impossibility result, showing that privacy and fairness cannot be simultaneously guaranteed over all possible datasets that could represent the locations of individuals in a population. We then consider a relaxation that still requires worst-case DP, but only seeks fairness and social welfare over smaller, more "realistic-looking" families of datasets. For this relaxation, we construct a DP mechanism and demonstrate that it is simultaneously optimal (or, for a harder family of datasets, near-optimal up to small factors) on fairness and social welfare. This suggests that while there is a tradeoff between privacy and each of social welfare and fairness, there is no additional tradeoff when we consider all three objectives simu

Tradeoffs in Privacy, Welfare, and Fairness for Facility Location

Sara Fish, Yannai A. Gonczarowski, Jason Z. Tang, Salil Vadhan

http://arxiv.org/abs/2604.10443

1 week ago 0 0 0 0
Differentially Private Verification of Distribution Properties

Elbert Du, Cynthia Dwork, Pranay Tankala, Linjun Zhang

http://arxiv.org/abs/2604.10819

A recent line of work initiated by Chiesa and Gur and further developed by Herman and Rothblum investigates the sample and communication complexity of verifying properties of distributions with the assistance of a powerful, knowledgeable, but untrusted prover. In this work, we initiate the study of differentially private (DP) distribution property testing. After all, if we do not trust the prover to help us with verification, why should we trust it with our sensitive sample? We map a landscape of DP prover-aided proofs of properties of distributions. In the non-private case it is known that one-round (two message) private-coin protocols can have substantially lower complexity than public-coin AM protocols, but in the private case, the possibility for improvement depends on the parameter regime and privacy model. Drawing on connections to replicability and techniques for amplification, we show: (1) There exists a reduction from any one-round $(\varepsilon,δ)$-DP private-coin interactive proof to a one-round public-coin DP interactive proof with the same privacy parameters, for the parameter regime $\varepsilon=O(1/\sqrt{n})$ and $δ=O(1/n^{5/2})$, and with the same sample and communication complexities. (2) If the verifier's message in the private-coin interactive proof is $O(1/\sqrt{\log n})$ locally DP -- a far more relaxed privacy parameter regime in a different model -- then applying one additional transformation again yields a one-round public-coin protocol with the same privacy bound and the same sample and computational complexities. (3) However, when the privacy guarantee is very relaxed ($\varepsilon\inΩ(\log n)$), private coins indeed reduce complexity. We also obtain a Merlin-Arthur (one-message) proof for privately testing whether samples are drawn from a product distribution, and prove that its sample com

Differentially Private Verification of Distribution Properties Elbert Du, Cynthia Dwork, Pranay Tankala, Linjun Zhang http://arxiv.org/abs/2604.10819 A recent line of work initiated by Chiesa and Gur and further developed by Herman and Rothblum investigates the sample and communication complexity of verifying properties of distributions with the assistance of a powerful, knowledgeable, but untrusted prover. In this work, we initiate the study of differentially private (DP) distribution property testing. After all, if we do not trust the prover to help us with verification, why should we trust it with our sensitive sample? We map a landscape of DP prover-aided proofs of properties of distributions. In the non-private case it is known that one-round (two message) private-coin protocols can have substantially lower complexity than public-coin AM protocols, but in the private case, the possibility for improvement depends on the parameter regime and privacy model. Drawing on connections to replicability and techniques for amplification, we show: (1) There exists a reduction from any one-round $(\varepsilon,δ)$-DP private-coin interactive proof to a one-round public-coin DP interactive proof with the same privacy parameters, for the parameter regime $\varepsilon=O(1/\sqrt{n})$ and $δ=O(1/n^{5/2})$, and with the same sample and communication complexities. (2) If the verifier's message in the private-coin interactive proof is $O(1/\sqrt{\log n})$ locally DP -- a far more relaxed privacy parameter regime in a different model -- then applying one additional transformation again yields a one-round public-coin protocol with the same privacy bound and the same sample and computational complexities. (3) However, when the privacy guarantee is very relaxed ($\varepsilon\inΩ(\log n)$), private coins indeed reduce complexity. We also obtain a Merlin-Arthur (one-message) proof for privately testing whether samples are drawn from a product distribution, and prove that its sample com

Differentially Private Verification of Distribution Properties

Elbert Du, Cynthia Dwork, Pranay Tankala, Linjun Zhang

http://arxiv.org/abs/2604.10819

1 week ago 0 0 0 0
Answering Counting Queries with Differential Privacy on a Quantum Computer

Arghya Mukherjee, Hassan Jameel Asghar, Gavin K. Brennen

http://arxiv.org/abs/2604.10881

Differential privacy is a mathematical notion of data privacy that has fast become the de facto standard in privacy-preserving data analysis. Recently a lot of work has focused on differential privacy in the quantum setting. Continuing on this line of study, we investigate how to answer counting queries on a quantum encoded dataset with differential privacy. An example of a counting query is ``How many people in the dataset are over the age of 25 and with a university education?'' Counting queries form the most basic but nonetheless rich set of statistics extractable from a dataset. We show that answering these queries on a quantum encoded dataset reduces to measuring the amplitude of one of two orthogonal states. We then analyze the differential privacy properties of two algorithms from literature to measure amplitude: one which performs repeated measurements in the computational basis, and the other which utilizes the classic amplitude estimation algorithm. For the first technique, we prove privacy results for the case of counting queries that improve on previously known results on general queries, and show that the mechanism in fact \emph{amplifies} privacy due to inherent randomness. For the second method, we derive a tight bound on maximum possible change in the amplitude if we add or remove a single item in the dataset, a quantity called global sensitivity which is central in making an algorithm differentially private. We then show a differentially private version of the amplitude estimation algorithm for counting queries. We also discuss how these methods can be outsourced to a quantum server to blindly compute counting queries with differential privacy.

Answering Counting Queries with Differential Privacy on a Quantum Computer Arghya Mukherjee, Hassan Jameel Asghar, Gavin K. Brennen http://arxiv.org/abs/2604.10881 Differential privacy is a mathematical notion of data privacy that has fast become the de facto standard in privacy-preserving data analysis. Recently a lot of work has focused on differential privacy in the quantum setting. Continuing on this line of study, we investigate how to answer counting queries on a quantum encoded dataset with differential privacy. An example of a counting query is ``How many people in the dataset are over the age of 25 and with a university education?'' Counting queries form the most basic but nonetheless rich set of statistics extractable from a dataset. We show that answering these queries on a quantum encoded dataset reduces to measuring the amplitude of one of two orthogonal states. We then analyze the differential privacy properties of two algorithms from literature to measure amplitude: one which performs repeated measurements in the computational basis, and the other which utilizes the classic amplitude estimation algorithm. For the first technique, we prove privacy results for the case of counting queries that improve on previously known results on general queries, and show that the mechanism in fact \emph{amplifies} privacy due to inherent randomness. For the second method, we derive a tight bound on maximum possible change in the amplitude if we add or remove a single item in the dataset, a quantity called global sensitivity which is central in making an algorithm differentially private. We then show a differentially private version of the amplitude estimation algorithm for counting queries. We also discuss how these methods can be outsourced to a quantum server to blindly compute counting queries with differential privacy.

Answering Counting Queries with Differential Privacy on a Quantum Computer

Arghya Mukherjee, Hassan Jameel Asghar, Gavin K. Brennen

http://arxiv.org/abs/2604.10881

1 week ago 0 0 0 0
Advertisement
Realisation-Level Privacy Filtering

Sophie Taylor, Praneeth Vippathalla, Justin Coon

http://arxiv.org/abs/2604.08630

We study differentially private data release, where a database is accessed through successive, possibly adaptive queries and mechanisms. Existing composition theorems and privacy filters combine worst case per-round privacy parameters, leaving room for more refined accounting based on realised leakage, which we term realisation-level accounting. We propose a realisation-level filtering approach to determine stopping times for data releases, and design one such filter. Despite technical challenges arising from conditioning on realisations and stopping time, we prove that the filter guarantees $(ε, δ)$-differential privacy, with $ε$ and $δ$ chosen by the data handler. Through numerical evidence, we demonstrate that realisation-level filtering provides a path to better utility beyond mechanism-level methods. Furthermore, our proposed filter applies to arbitrary mechanisms, including those that are badly behaved under Rényi differential privacy.

Realisation-Level Privacy Filtering Sophie Taylor, Praneeth Vippathalla, Justin Coon http://arxiv.org/abs/2604.08630 We study differentially private data release, where a database is accessed through successive, possibly adaptive queries and mechanisms. Existing composition theorems and privacy filters combine worst case per-round privacy parameters, leaving room for more refined accounting based on realised leakage, which we term realisation-level accounting. We propose a realisation-level filtering approach to determine stopping times for data releases, and design one such filter. Despite technical challenges arising from conditioning on realisations and stopping time, we prove that the filter guarantees $(ε, δ)$-differential privacy, with $ε$ and $δ$ chosen by the data handler. Through numerical evidence, we demonstrate that realisation-level filtering provides a path to better utility beyond mechanism-level methods. Furthermore, our proposed filter applies to arbitrary mechanisms, including those that are badly behaved under Rényi differential privacy.

Realisation-Level Privacy Filtering

Sophie Taylor, Praneeth Vippathalla, Justin Coon

http://arxiv.org/abs/2604.08630

1 week ago 0 0 0 0
Node-Private Community Detection in Stochastic Block Models

Olga Klopp, Ilias Zadik

http://arxiv.org/abs/2604.09078

We study community detection in stochastic block models under pure node-level differential privacy, a stringent notion that protects the participation of an individual together with all of their incident edges. This setting is substantially more challenging than edge-private community detection, since modifying a single node can affect linearly many observations. On the algorithmic side, we analyze a node-private estimator based on the exponential mechanism combined with an extension lemma, and show that exact recovery remains achievable. In the standard sparse regime with logarithmic average degree and a fixed number of communities, our results imply that a logarithmic privacy budget suffices to obtain nontrivial recovery guarantees. On the lower bound side, we show that this logarithmic scaling is in fact unavoidable: any pure node-private method must fail to achieve polynomially small exact-recovery error, or polynomially small expected mismatch, unless the privacy budget is at least of this order. Moreover, in the regime of super-logarithmic privacy budgets, our upper and lower bounds yield a matching two-term characterization of the minimax risk, with one term governed by the non-private statistical signal and the other by the privacy budget; these match up to universal constants in the exponents. Taken together, our results identify an inherent logarithmic privacy cost in node-private community detection, absent under edge differential privacy, and provide a precise rate-level characterization of the tradeoff between node privacy and SBM recovery.

Node-Private Community Detection in Stochastic Block Models Olga Klopp, Ilias Zadik http://arxiv.org/abs/2604.09078 We study community detection in stochastic block models under pure node-level differential privacy, a stringent notion that protects the participation of an individual together with all of their incident edges. This setting is substantially more challenging than edge-private community detection, since modifying a single node can affect linearly many observations. On the algorithmic side, we analyze a node-private estimator based on the exponential mechanism combined with an extension lemma, and show that exact recovery remains achievable. In the standard sparse regime with logarithmic average degree and a fixed number of communities, our results imply that a logarithmic privacy budget suffices to obtain nontrivial recovery guarantees. On the lower bound side, we show that this logarithmic scaling is in fact unavoidable: any pure node-private method must fail to achieve polynomially small exact-recovery error, or polynomially small expected mismatch, unless the privacy budget is at least of this order. Moreover, in the regime of super-logarithmic privacy budgets, our upper and lower bounds yield a matching two-term characterization of the minimax risk, with one term governed by the non-private statistical signal and the other by the privacy budget; these match up to universal constants in the exponents. Taken together, our results identify an inherent logarithmic privacy cost in node-private community detection, absent under edge differential privacy, and provide a precise rate-level characterization of the tradeoff between node privacy and SBM recovery.

Node-Private Community Detection in Stochastic Block Models

Olga Klopp, Ilias Zadik

http://arxiv.org/abs/2604.09078

1 week ago 0 0 0 0
Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation

Qian Ma, Sarah Rajtmajer

http://arxiv.org/abs/2604.07486

Large language models (LLMs) have emerged as a powerful tool for synthetic data generation. A particularly important use case is producing synthetic replicas of private text, which requires carefully balancing privacy and utility. We propose Realistic and Privacy-Preserving Synthetic Data Generation (RPSG), which leverages privacy-preserving mechanisms, including formal differential privacy (DP); and private seeds, in particular text containing personal information, to generate realistic synthetic data. Comprehensive experiments against state-of-the-art private synthetic data generation methods demonstrate that RPSG achieves high fidelity to private data while providing strong privacy protection.

Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation Qian Ma, Sarah Rajtmajer http://arxiv.org/abs/2604.07486 Large language models (LLMs) have emerged as a powerful tool for synthetic data generation. A particularly important use case is producing synthetic replicas of private text, which requires carefully balancing privacy and utility. We propose Realistic and Privacy-Preserving Synthetic Data Generation (RPSG), which leverages privacy-preserving mechanisms, including formal differential privacy (DP); and private seeds, in particular text containing personal information, to generate realistic synthetic data. Comprehensive experiments against state-of-the-art private synthetic data generation methods demonstrate that RPSG achieves high fidelity to private data while providing strong privacy protection.

Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation

Qian Ma, Sarah Rajtmajer

http://arxiv.org/abs/2604.07486

1 week ago 1 0 0 0
Differentially Private Modeling of Disease Transmission within Human Contact Networks

Shlomi Hod, Debanuj Nayak, Jason R. Gantenberg, Iden Kalemaj, Thomas A. Trikalinos, Adam Smith

http://arxiv.org/abs/2604.07493

Epidemiologic studies of infectious diseases often rely on models of contact networks to capture the complex interactions that govern disease spread, and ongoing projects aim to vastly increase the scale at which such data can be collected. However, contact networks may include sensitive information, such as sexual relationships or drug use behavior. Protecting individual privacy while maintaining the scientific usefulness of the data is crucial. We propose a privacy-preserving pipeline for disease spread simulation studies based on a sensitive network that integrates differential privacy (DP) with statistical network models such as stochastic block models (SBMs) and exponential random graph models (ERGMs). Our pipeline comprises three steps: (1) compute network summary statistics using \emph{node-level} DP (which corresponds to protecting individuals' contributions); (2) fit a statistical model, like an ERGM, using these summaries, which allows generating synthetic networks reflecting the structure of the original network; and (3) simulate disease spread on the synthetic networks using an agent-based model. We evaluate the effectiveness of our approach using a simple Susceptible-Infected-Susceptible (SIS) disease model under multiple configurations. We compare both numerical results, such as simulated disease incidence and prevalence, as well as qualitative conclusions such as intervention effect size, on networks generated with and without differential privacy constraints. Our experiments are based on egocentric sexual network data from the ARTNet study (a survey about HIV-related behaviors). Our results show that the noise added for privacy is small relative to other sources of error (sampling and model misspecification). This suggests that, in princi

Differentially Private Modeling of Disease Transmission within Human Contact Networks Shlomi Hod, Debanuj Nayak, Jason R. Gantenberg, Iden Kalemaj, Thomas A. Trikalinos, Adam Smith http://arxiv.org/abs/2604.07493 Epidemiologic studies of infectious diseases often rely on models of contact networks to capture the complex interactions that govern disease spread, and ongoing projects aim to vastly increase the scale at which such data can be collected. However, contact networks may include sensitive information, such as sexual relationships or drug use behavior. Protecting individual privacy while maintaining the scientific usefulness of the data is crucial. We propose a privacy-preserving pipeline for disease spread simulation studies based on a sensitive network that integrates differential privacy (DP) with statistical network models such as stochastic block models (SBMs) and exponential random graph models (ERGMs). Our pipeline comprises three steps: (1) compute network summary statistics using \emph{node-level} DP (which corresponds to protecting individuals' contributions); (2) fit a statistical model, like an ERGM, using these summaries, which allows generating synthetic networks reflecting the structure of the original network; and (3) simulate disease spread on the synthetic networks using an agent-based model. We evaluate the effectiveness of our approach using a simple Susceptible-Infected-Susceptible (SIS) disease model under multiple configurations. We compare both numerical results, such as simulated disease incidence and prevalence, as well as qualitative conclusions such as intervention effect size, on networks generated with and without differential privacy constraints. Our experiments are based on egocentric sexual network data from the ARTNet study (a survey about HIV-related behaviors). Our results show that the noise added for privacy is small relative to other sources of error (sampling and model misspecification). This suggests that, in princi

Differentially Private Modeling of Disease Transmission within Human Contact Networks

Shlomi Hod, Debanuj Nayak, Jason R. Gantenberg, Iden Kalemaj, Thomas A. Trikalinos, Adam Smith

http://arxiv.org/abs/2604.07493

1 week ago 1 1 1 0
Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

Thomas Humphries, Tim Li, Shufan Zhang, Karl Knopf, Xi He

http://arxiv.org/abs/2604.07581

It can be difficult for practitioners to interpret the quality of differentially private (DP) statistics due to the added noise. One method to help analysts understand the amount of error introduced by DP is to return a Randomization Interval (RI), along with the statistic. A RI is a type of confidence interval that bounds the error introduced by DP. For queries where the noise distribution depends on the input, such as the median, prior work degrades the quality of the median itself to obtain a high-quality RI. In this work, we propose PostRI, a solution to compute a RI after the median has been estimated. PostRI enables a median estimation with 14%-850% higher utility than related work, while maintaining a narrow RI.

Interpreting the Error of Differentially Private Median Queries through Randomization Intervals Thomas Humphries, Tim Li, Shufan Zhang, Karl Knopf, Xi He http://arxiv.org/abs/2604.07581 It can be difficult for practitioners to interpret the quality of differentially private (DP) statistics due to the added noise. One method to help analysts understand the amount of error introduced by DP is to return a Randomization Interval (RI), along with the statistic. A RI is a type of confidence interval that bounds the error introduced by DP. For queries where the noise distribution depends on the input, such as the median, prior work degrades the quality of the median itself to obtain a high-quality RI. In this work, we propose PostRI, a solution to compute a RI after the median has been estimated. PostRI enables a median estimation with 14%-850% higher utility than related work, while maintaining a narrow RI.

Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

Thomas Humphries, Tim Li, Shufan Zhang, Karl Knopf, Xi He

http://arxiv.org/abs/2604.07581

1 week ago 0 0 0 0
Inexact Limited Memory Bundle Method

Jenni Lampainen, Kaisa Joki, Napsu Karmitsa, Marko M. Mäkelä

http://arxiv.org/abs/2604.08067

Large-scale nonsmooth optimization problems arise in many real-world applications, but obtaining exact function and subgradient values for these problems may be computationally expensive or even infeasible. In many practical settings, only inexact information is available due to measurement or modeling errors, privacy-preserving computations, or stochastic approximations, making inexact optimization methods particularly relevant. In this paper, we propose a novel inexact limited memory bundle method for large-scale nonsmooth nonconvex optimization. The method tolerates noise in both function values and subgradients. We prove the global convergence of the proposed method to an approximate stationary point. Numerical experiments with different levels of noise in function and/or subgradient values show that the method performs well with both exact and noisy data. In particular, the results demonstrate competitiveness in large-scale nonsmooth optimization and highlight the suitability of the method for applications where noise is unavoidable, such as differential privacy in machine learning.

Inexact Limited Memory Bundle Method Jenni Lampainen, Kaisa Joki, Napsu Karmitsa, Marko M. Mäkelä http://arxiv.org/abs/2604.08067 Large-scale nonsmooth optimization problems arise in many real-world applications, but obtaining exact function and subgradient values for these problems may be computationally expensive or even infeasible. In many practical settings, only inexact information is available due to measurement or modeling errors, privacy-preserving computations, or stochastic approximations, making inexact optimization methods particularly relevant. In this paper, we propose a novel inexact limited memory bundle method for large-scale nonsmooth nonconvex optimization. The method tolerates noise in both function values and subgradients. We prove the global convergence of the proposed method to an approximate stationary point. Numerical experiments with different levels of noise in function and/or subgradient values show that the method performs well with both exact and noisy data. In particular, the results demonstrate competitiveness in large-scale nonsmooth optimization and highlight the suitability of the method for applications where noise is unavoidable, such as differential privacy in machine learning.

Inexact Limited Memory Bundle Method

Jenni Lampainen, Kaisa Joki, Napsu Karmitsa, Marko M. Mäkelä

http://arxiv.org/abs/2604.08067

1 week ago 0 0 0 0
TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems

Labani Halder, Payel Sadhukhan, Sarbani Palit

http://arxiv.org/abs/2604.08113

Ensuring reliability in adversarial settings necessitates treating privacy as a foundational component of data-driven systems. While differential privacy and cryptographic protocols offer strong guarantees, existing schemes rely on a fixed privacy budget, leading to a rigid utility-privacy trade-off that fails under heterogeneous user trust. Moreover, noise-only differential privacy preserves geometric structure, which inference attacks exploit, causing privacy leakage.
  We propose TADP-RME (Trust-Adaptive Differential Privacy with Reverse Manifold Embedding), a framework that enhances reliability under varying levels of user trust. It introduces an inverse trust score in the range [0,1] to adaptively modulate the privacy budget, enabling smooth transitions between utility and privacy. Additionally, Reverse Manifold Embedding applies a nonlinear transformation to disrupt local geometric relationships while preserving formal differential privacy guarantees through post-processing.
  Theoretical and empirical results demonstrate improved privacy-utility trade-offs, reducing attack success rates by up to 3.1 percent without significant utility degradation. The framework consistently outperforms existing methods against inference attacks, providing a unified approach for reliable learning in adversarial environments.

TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems Labani Halder, Payel Sadhukhan, Sarbani Palit http://arxiv.org/abs/2604.08113 Ensuring reliability in adversarial settings necessitates treating privacy as a foundational component of data-driven systems. While differential privacy and cryptographic protocols offer strong guarantees, existing schemes rely on a fixed privacy budget, leading to a rigid utility-privacy trade-off that fails under heterogeneous user trust. Moreover, noise-only differential privacy preserves geometric structure, which inference attacks exploit, causing privacy leakage. We propose TADP-RME (Trust-Adaptive Differential Privacy with Reverse Manifold Embedding), a framework that enhances reliability under varying levels of user trust. It introduces an inverse trust score in the range [0,1] to adaptively modulate the privacy budget, enabling smooth transitions between utility and privacy. Additionally, Reverse Manifold Embedding applies a nonlinear transformation to disrupt local geometric relationships while preserving formal differential privacy guarantees through post-processing. Theoretical and empirical results demonstrate improved privacy-utility trade-offs, reducing attack success rates by up to 3.1 percent without significant utility degradation. The framework consistently outperforms existing methods against inference attacks, providing a unified approach for reliable learning in adversarial environments.

TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems

Labani Halder, Payel Sadhukhan, Sarbani Palit

http://arxiv.org/abs/2604.08113

1 week ago 0 0 0 0
Differentially Private Language Generation and Identification in the Limit

Anay Mehrotra, Grigoris Velegkas, Xifan Yu, Felix Zhou

http://arxiv.org/abs/2604.08504

We initiate the study of language generation in the limit, a model recently introduced by Kleinberg and Mullainathan [KM24], under the constraint of differential privacy. We consider the continual release model, where a generator must eventually output a stream of valid strings while protecting the privacy of the entire input sequence. Our first main result is that for countable collections of languages, privacy comes at no qualitative cost: we provide an $\varepsilon$-differentially-private algorithm that generates in the limit from any countable collection. This stands in contrast to many learning settings where privacy renders learnability impossible. However, privacy does impose a quantitative cost: there are finite collections of size $k$ for which uniform private generation requires $Ω(k/\varepsilon)$ samples, whereas just one sample suffices non-privately.
  We then turn to the harder problem of language identification in the limit. Here, we show that privacy creates fundamental barriers. We prove that no $\varepsilon$-DP algorithm can identify a collection containing two languages with an infinite intersection and a finite set difference, a condition far stronger than the classical non-private characterization of identification. Next, we turn to the stochastic setting where the sample strings are sampled i.i.d. from a distribution (instead of being generated by an adversary). Here, we show that private identification is possible if and only if the collection is identifiable in the adversarial model. Together, our results establish new dimensions along which generation and identification differ and, for identification, a separation between adversarial and stochastic settings induced by privacy constraints.

Differentially Private Language Generation and Identification in the Limit Anay Mehrotra, Grigoris Velegkas, Xifan Yu, Felix Zhou http://arxiv.org/abs/2604.08504 We initiate the study of language generation in the limit, a model recently introduced by Kleinberg and Mullainathan [KM24], under the constraint of differential privacy. We consider the continual release model, where a generator must eventually output a stream of valid strings while protecting the privacy of the entire input sequence. Our first main result is that for countable collections of languages, privacy comes at no qualitative cost: we provide an $\varepsilon$-differentially-private algorithm that generates in the limit from any countable collection. This stands in contrast to many learning settings where privacy renders learnability impossible. However, privacy does impose a quantitative cost: there are finite collections of size $k$ for which uniform private generation requires $Ω(k/\varepsilon)$ samples, whereas just one sample suffices non-privately. We then turn to the harder problem of language identification in the limit. Here, we show that privacy creates fundamental barriers. We prove that no $\varepsilon$-DP algorithm can identify a collection containing two languages with an infinite intersection and a finite set difference, a condition far stronger than the classical non-private characterization of identification. Next, we turn to the stochastic setting where the sample strings are sampled i.i.d. from a distribution (instead of being generated by an adversary). Here, we show that private identification is possible if and only if the collection is identifiable in the adversarial model. Together, our results establish new dimensions along which generation and identification differ and, for identification, a separation between adversarial and stochastic settings induced by privacy constraints.

Differentially Private Language Generation and Identification in the Limit

Anay Mehrotra, Grigoris Velegkas, Xifan Yu, Felix Zhou

http://arxiv.org/abs/2604.08504

1 week ago 1 0 0 0
Optimal Rates for Pure { varepsilon}-Differentially Private Stochastic Convex Optimization with Heavy Tails

Andrew Lowy

http://arxiv.org/abs/2604.06492

We study stochastic convex optimization (SCO) with heavy-tailed gradients under pure epsilon-differential privacy (DP). Instead of assuming a bound on the worst-case Lipschitz parameter of the loss, we assume only a bounded k-th moment. This assumption allows for unbounded, heavy-tailed stochastic gradient distributions, and can yield sharper excess risk bounds. The minimax optimal rate for approximate (epsilon, delta)-DP SCO is known in this setting, but the pure epsilon-DP case has remained open. We characterize the minimax optimal excess-risk rate for pure epsilon-DP heavy-tailed SCO up to logarithmic factors. Our algorithm achieves this rate in polynomial time with high probability. Moreover, it runs in polynomial time with probability 1 when the worst-case Lipschitz parameter is polynomially bounded. For important structured problem classes - including hinge/ReLU-type and absolute-value losses on Euclidean balls, ellipsoids, and polytopes - we achieve the same excess-risk guarantee in polynomial time with probability 1 even when the worst-case Lipschitz parameter is infinite. Our approach is based on a novel framework for privately optimizing Lipschitz extensions of the empirical loss. We complement our excess risk upper bound with a novel high probability lower bound.

Optimal Rates for Pure { varepsilon}-Differentially Private Stochastic Convex Optimization with Heavy Tails Andrew Lowy http://arxiv.org/abs/2604.06492 We study stochastic convex optimization (SCO) with heavy-tailed gradients under pure epsilon-differential privacy (DP). Instead of assuming a bound on the worst-case Lipschitz parameter of the loss, we assume only a bounded k-th moment. This assumption allows for unbounded, heavy-tailed stochastic gradient distributions, and can yield sharper excess risk bounds. The minimax optimal rate for approximate (epsilon, delta)-DP SCO is known in this setting, but the pure epsilon-DP case has remained open. We characterize the minimax optimal excess-risk rate for pure epsilon-DP heavy-tailed SCO up to logarithmic factors. Our algorithm achieves this rate in polynomial time with high probability. Moreover, it runs in polynomial time with probability 1 when the worst-case Lipschitz parameter is polynomially bounded. For important structured problem classes - including hinge/ReLU-type and absolute-value losses on Euclidean balls, ellipsoids, and polytopes - we achieve the same excess-risk guarantee in polynomial time with probability 1 even when the worst-case Lipschitz parameter is infinite. Our approach is based on a novel framework for privately optimizing Lipschitz extensions of the empirical loss. We complement our excess risk upper bound with a novel high probability lower bound.

Optimal Rates for Pure { varepsilon}-Differentially Private Stochastic Convex Optimization with Heavy Tails

Andrew Lowy

http://arxiv.org/abs/2604.06492

1 week ago 2 0 1 0
Equivalence Testing Under Privacy Constraints

Savita Pareek, Luca Insolia, Roberto Molinari, Stéphane Guerrier

http://arxiv.org/abs/2604.06499

Protecting individual privacy is essential across research domains, from socio-economic surveys to big-tech user data. This need is particularly acute in healthcare, where analyses often involve sensitive patient information. A typical example is comparing treatment efficacy across hospitals or ensuring consistency in diagnostic laboratory calibrations, both requiring privacy-preserving statistical procedures. However, standard equivalence testing procedures for differences in proportions or means, commonly used to assess average equivalence, can inadvertently disclose sensitive information. To address this problem, we develop differentially private equivalence testing procedures that rely on simulation-based calibration, as the finite-sample distribution is analytically intractable. Our approach introduces a unified framework, termed DP-TOST, for conducting differentially private equivalence testing of both means and proportions. Through numerical simulations and real-world applications, we demonstrate that the proposed method maintains type-I error control at the nominal level and achieves power comparable to its non-private counterpart as the privacy budget and/or sample size increases, while ensuring strong privacy guarantees. These findings establish a reliable and practical framework for privacy-preserving equivalence testing in high-stakes fields such as healthcare, among others.

Equivalence Testing Under Privacy Constraints Savita Pareek, Luca Insolia, Roberto Molinari, Stéphane Guerrier http://arxiv.org/abs/2604.06499 Protecting individual privacy is essential across research domains, from socio-economic surveys to big-tech user data. This need is particularly acute in healthcare, where analyses often involve sensitive patient information. A typical example is comparing treatment efficacy across hospitals or ensuring consistency in diagnostic laboratory calibrations, both requiring privacy-preserving statistical procedures. However, standard equivalence testing procedures for differences in proportions or means, commonly used to assess average equivalence, can inadvertently disclose sensitive information. To address this problem, we develop differentially private equivalence testing procedures that rely on simulation-based calibration, as the finite-sample distribution is analytically intractable. Our approach introduces a unified framework, termed DP-TOST, for conducting differentially private equivalence testing of both means and proportions. Through numerical simulations and real-world applications, we demonstrate that the proposed method maintains type-I error control at the nominal level and achieves power comparable to its non-private counterpart as the privacy budget and/or sample size increases, while ensuring strong privacy guarantees. These findings establish a reliable and practical framework for privacy-preserving equivalence testing in high-stakes fields such as healthcare, among others.

Equivalence Testing Under Privacy Constraints

Savita Pareek, Luca Insolia, Roberto Molinari, Stéphane Guerrier

http://arxiv.org/abs/2604.06499

1 week ago 0 0 0 0
Advertisement
Adaptive Differential Privacy for Federated Medical Image Segmentation Across Diverse Modalities

Puja Saha, Eranga Ukwatta

http://arxiv.org/abs/2604.06518

Large volumes of medical data remain underutilized because centralizing distributed data is often infeasible due to strict privacy regulations and institutional constraints. In addition, models trained in centralized settings frequently fail to generalize across clinical sites because of heterogeneity in imaging protocols and continuously evolving data distributions arising from differences in scanners, acquisition parameters, and patient populations. Federated learning offers a promising solution by enabling collaborative model training without sharing raw data. However, incorporating differential privacy into federated learning, while essential for privacy guarantees, often leads to degraded accuracy, unstable convergence, and reduced generalization. In this work, we propose an adaptive differentially private federated learning (ADP-FL) framework for medical image segmentation that dynamically adjusts privacy mechanisms to better balance the privacy-utility trade-off. The proposed approach stabilizes training, significantly improves Dice scores and segmentation boundary quality, and maintains rigorous privacy guarantees. We evaluated ADP-FL across diverse imaging modalities and segmentation tasks, including skin lesion segmentation in dermoscopic images, kidney tumor segmentation in 3D CT scans, and brain tumor segmentation in multi-parametric MRI. Compared with conventional federated learning and standard differentially private federated learning, ADP-FL consistently achieves higher accuracy, improved boundary delineation, faster convergence, and greater training stability, with performance approaching that of non-private federated learning under the same privacy budgets. These results demonstrate the practical viability of ADP-FL for high-performance, privacy-preserving medical image segmentation in real-wo

Adaptive Differential Privacy for Federated Medical Image Segmentation Across Diverse Modalities Puja Saha, Eranga Ukwatta http://arxiv.org/abs/2604.06518 Large volumes of medical data remain underutilized because centralizing distributed data is often infeasible due to strict privacy regulations and institutional constraints. In addition, models trained in centralized settings frequently fail to generalize across clinical sites because of heterogeneity in imaging protocols and continuously evolving data distributions arising from differences in scanners, acquisition parameters, and patient populations. Federated learning offers a promising solution by enabling collaborative model training without sharing raw data. However, incorporating differential privacy into federated learning, while essential for privacy guarantees, often leads to degraded accuracy, unstable convergence, and reduced generalization. In this work, we propose an adaptive differentially private federated learning (ADP-FL) framework for medical image segmentation that dynamically adjusts privacy mechanisms to better balance the privacy-utility trade-off. The proposed approach stabilizes training, significantly improves Dice scores and segmentation boundary quality, and maintains rigorous privacy guarantees. We evaluated ADP-FL across diverse imaging modalities and segmentation tasks, including skin lesion segmentation in dermoscopic images, kidney tumor segmentation in 3D CT scans, and brain tumor segmentation in multi-parametric MRI. Compared with conventional federated learning and standard differentially private federated learning, ADP-FL consistently achieves higher accuracy, improved boundary delineation, faster convergence, and greater training stability, with performance approaching that of non-private federated learning under the same privacy budgets. These results demonstrate the practical viability of ADP-FL for high-performance, privacy-preserving medical image segmentation in real-wo

Adaptive Differential Privacy for Federated Medical Image Segmentation Across Diverse Modalities

Puja Saha, Eranga Ukwatta

http://arxiv.org/abs/2604.06518

1 week ago 0 0 0 0