Abstract
Label differential privacy (label-DP) trains private machine learning models with public features and sensitive labels. While label-DP provides strong privacy guarantees, models remain vulnerable to label inference attacks (LIAs). The findings show that label-DP reduces an attacker’s advantage relative to a Bayes classifier but doesn’t prevent attacks entirely.
Introduction
Label-DP protects label privacy without hindering statistical inference. Key findings include:
- Label-DP reduces an attacker’s advantage but cannot fully prevent LIAs, especially for generalizable models.
- Privacy should limit an attacker’s advantage over a Bayes classifier rather than eliminate statistical inference.
Preliminaries
Label-DP assumes public features and private labels, commonly used in:
- Online advertising.
- Recommendation systems.
Methods for Achieving Label-DP
- Randomized Response (RR): Adds randomness to released labels.
- Label Private Multi-Stage Training (LP-MST): Predicts labels from learned priors.
- PATE-FixMatch (PATE-FM): Uses teacher models to train private student models.
- ALIBI: Adds noise to labels, recovering them via Bayesian inference.
Does Label-DP Prevent LIAs?
Experiments reveal that label-DP struggles to protect against LIAs effectively. Attack accuracy aligns closely with model test accuracy, highlighting limitations even under strong privacy settings.
Label-DP Provably Bounds Attack Advantage
Label-DP reduces attack advantage depending on privacy parameters (ε, δ), but attacks remain possible, particularly when the feature matrix is known.
Experiments
Simulated Dataset (Gaussian Mixture):
- Stronger privacy (lower ε) reduces Expected Attack Utility (EAU).
- Attack advantage stays below theoretical upper bounds.
Real-World Dataset (CTR Prediction):
- High EAU for large ε values; reduced EAU as ε decreases.
- Label-DP models approach constant prediction baselines.
Limitations
- Focus on ε, δ-label-DP; future work could explore Rényi-DP or Gaussian-DP.
- Limited real-world dataset evaluation highlights a need for broader testing.
Thoughts After Reading
- Comprehensive testing on diverse datasets is essential to improve label-DP robustness.
Reference
Ji, S., Pan, X., Zhang, X., et al. (2022). Privacy-Preserving Machine Learning with Differential Privacy and Secure Multiparty Computation. Proceedings of the 2022 IEEE International Conference on Data Engineering (ICDE), pp. 1854–1865. DOI:10.1109/ICDE53745.2022.00354