Differentially Private Learning Needs Better Model Initialization and Self-Distillation
Ngong, Ivoline C., Near, Joseph P., Mireshghallah, Niloofar
–arXiv.org Artificial Intelligence
DPSGD to fine-tune these models on private data often yields poor results, particularly when the private Differentially private SGD (DPSGD) enables dataset is small (Tramèr et al., 2022; Mireshghallah privacy-preserving training of language models, et al., 2021). Recent work has shown that leveraging but often reduces utility, diversity, and linguistic better hand-crafted features (Tramer and Boneh, 2020) quality. We introduce DPRefine, a threephase or features from large pre-trained language models (Li method that initializes a model using et al., 2022, 2021) can improve the privacy-utility tradeoff data synthesis from a small pre-trained LM in differentially private learning. However, these with rigorous filtering, applies DP finetuning approaches have limitations: smaller pre-trained models on private data, and performs self-distillation offer limited benefits, and fine-tuning larger models on to refine outputs. This approach significantly private data may be infeasible due to proprietary concerns outperforms vanilla DPSGD, with AlpacaEval or infrastructure limitations. This raises a critical preferring DPRefine's generations in 78.4% question: Can we develop small, domain-specific language of cases across all datasets. Our analysis reveals models that achieve high performance without that DPRefine reduces linguistic errors in requiring large private datasets or large, pre-trained generated text by 84.0%, mitigating grammar models?
arXiv.org Artificial Intelligence
Oct-23-2024
- Country:
- Asia (1.00)
- Europe > United Kingdom (1.00)
- North America > United States (1.00)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Cardiology/Vascular Diseases (1.00)
- Immunology (0.93)
- Infections and Infectious Diseases (1.00)
- Oncology (0.67)
- Psychiatry/Psychology (0.67)
- Information Technology > Security & Privacy (1.00)
- Law (0.93)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Leisure & Entertainment > Sports (1.00)
- Technology: