Differentially Private Learning Needs Better Model Initialization and Self-Distillation
Ngong, Ivoline C., Near, Joseph P., Mireshghallah, Niloofar
–arXiv.org Artificial Intelligence
DPSGD to fine-tune these models on private data often yields poor results, particularly when the private Differentially private SGD (DPSGD) enables dataset is small (Tramèr et al., 2022; Mireshghallah privacy-preserving training of language models, et al., 2021). Recent work has shown that leveraging but often reduces utility, diversity, and linguistic better hand-crafted features (Tramer and Boneh, 2020) quality. We introduce DPRefine, a threephase or features from large pre-trained language models (Li method that initializes a model using et al., 2022, 2021) can improve the privacy-utility tradeoff data synthesis from a small pre-trained LM in differentially private learning. However, these with rigorous filtering, applies DP finetuning approaches have limitations: smaller pre-trained models on private data, and performs self-distillation offer limited benefits, and fine-tuning larger models on to refine outputs. This approach significantly private data may be infeasible due to proprietary concerns outperforms vanilla DPSGD, with AlpacaEval or infrastructure limitations. This raises a critical preferring DPRefine's generations in 78.4% question: Can we develop small, domain-specific language of cases across all datasets. Our analysis reveals models that achieve high performance without that DPRefine reduces linguistic errors in requiring large private datasets or large, pre-trained generated text by 84.0%, mitigating grammar models?
arXiv.org Artificial Intelligence
Oct-23-2024
- Country:
- Atlantic Ocean > Gulf of Mexico (0.04)
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.14)
- North America
- Mexico (0.04)
- United States
- Louisiana (0.04)
- Vermont (0.04)
- Texas > Travis County
- Austin (0.04)
- New York > New York County
- New York City (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California > San Diego County
- Carlsbad (0.04)
- Europe > United Kingdom
- Wales (0.04)
- England > East Sussex (0.04)
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Leisure & Entertainment > Sports (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Infections and Infectious Diseases (1.00)
- Cardiology/Vascular Diseases (1.00)
- Immunology (0.93)
- Psychiatry/Psychology (0.67)
- Oncology (0.67)
- Government
- Technology: