AITopics

2502.01637

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

arXiv.org Artificial IntelligenceJan-31-2025

Scaling Laws for Differentially Private Language Models

McKenna, Ryan, Huang, Yangsibo, Sinha, Amer, Balle, Borja, Charles, Zachary, Choquette-Choo, Christopher A., Ghazi, Badih, Kaissis, George, Kumar, Ravi, Liu, Ruibo, Yu, Da, Zhang, Chiyuan

Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs also rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data. Training models on this sensitive user data requires careful privacy protections like differential privacy (DP). However, the dynamics of DP training are significantly different, and consequently their scaling laws are not yet fully understood. In this work, we establish scaling laws that accurately model the intricacies of DP LLM training, providing a complete picture of the compute-privacy-utility tradeoffs and the optimal training configurations in many settings.

large language model, machine learning, natural language, (17 more...)

2501.18914

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-30-2024

On Memorization of Large Language Models in Logical Reasoning

Xie, Chulin, Huang, Yangsibo, Zhang, Chiyuan, Yu, Da, Chen, Xinyun, Lin, Bill Yuchen, Li, Bo, Ghazi, Badih, Kumar, Ravi

Large language models (LLMs) achieve good performance on challenging reasoning benchmarks, yet could also make basic reasoning mistakes. This contrasting behavior is puzzling when it comes to understanding the mechanisms behind LLMs' reasoning capabilities. One hypothesis is that the increasingly high and nearly saturated performance on common reasoning benchmarks could be due to the memorization of similar problems. In this paper, we systematically investigate this hypothesis with a quantitative measurement of memorization in reasoning tasks, using a dynamically generated logical reasoning benchmark based on Knights and Knaves (K&K) puzzles. We found that LLMs could interpolate the training puzzles (achieving near-perfect accuracy) after fine-tuning, yet fail when those puzzles are slightly perturbed, suggesting that the models heavily rely on memorization to solve those training puzzles. On the other hand, we show that while fine-tuning leads to heavy memorization, it also consistently improves generalization performance. In-depth analyses with perturbation tests, cross difficulty-level transferability, probing model internals, and fine-tuning with wrong answers suggest that the LLMs learn to reason on K&K puzzles despite training data memorization. This phenomenon indicates that LLMs exhibit a complex interplay between memorization and genuine reasoning abilities. Finally, our analysis with per-sample memorization score sheds light on how LLMs switch between reasoning and memorization in solving logical puzzles. Our code and data are available at https://memkklogic.github.io.

large language model, machine learning, puzzle, (20 more...)

2410.23123

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.45)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.45)
Energy > Oil & Gas > Midstream (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

arXiv.org Artificial IntelligenceMar-4-2024

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Xie, Chulin, Lin, Zinan, Backurs, Arturs, Gopi, Sivakanth, Yu, Da, Inan, Huseyin A, Nori, Harsha, Jiang, Haotian, Zhang, Huishuai, Lee, Yin Tat, Li, Bo, Yekhanin, Sergey

Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalable solution. However, existing methods necessitate DP finetuning of large language models (LLMs) on private data to generate DP synthetic data. This approach is not viable for proprietary LLMs (e.g., GPT-3.5) and also demands considerable computational resources for open-source LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models. In this work, we propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text. We use API access to an LLM and generate DP synthetic text without any model training. We conduct comprehensive experiments on three benchmark datasets. Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines. This underscores the feasibility of relying solely on API access of LLMs to produce high-quality DP synthetic texts, thereby facilitating more accessible routes to privacy-preserving LLM applications. Our code and data are available at https://github.com/AI-secure/aug-pe.

large language model, machine learning, natural language, (17 more...)

2403.01749

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningSep-2-2023

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

Yu, Da, Kamath, Gautam, Kulkarni, Janardhan, Liu, Tie-Yan, Yin, Jian, Zhang, Huishuai

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose output-specific $(\varepsilon,\delta)$-DP to characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We also design an efficient algorithm to investigate individual privacy across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees. For example, on CIFAR-10, the average $\varepsilon$ of the class with the lowest test accuracy is 44.2\% higher than that of the class with the highest accuracy.

artificial intelligence, machine learning, privacy parameter, (15 more...)

2206.02617

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceMay-23-2023

Selective Pre-training for Private Fine-tuning

Yu, Da, Gopi, Sivakanth, Kulkarni, Janardhan, Lin, Zinan, Naik, Saurabh, Religa, Tomasz Lukasz, Yin, Jian, Zhang, Huishuai

Suppose we want to train text prediction models in email clients or word processors. The models must preserve the privacy of user data and adhere to a specific fixed size to meet memory and inference time requirements. We introduce a generic framework to solve this problem. Specifically, we are given a public dataset $D_\text{pub}$ and a private dataset $D_\text{priv}$ corresponding to a downstream task $T$. How should we pre-train a fixed-size model $M$ on $D_\text{pub}$ and fine-tune it on $D_\text{priv}$ such that performance of $M$ with respect to $T$ is maximized and $M$ satisfies differential privacy with respect to $D_\text{priv}$? We show that pre-training on a {\em subset} of dataset $D_\text{pub}$ that brings the public distribution closer to the private distribution is a crucial ingredient to maximize the transfer learning abilities of $M$ after pre-training, especially in the regimes where model sizes are relatively small. Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, {\em smaller models} can match the performance of much larger models, highlighting the promise of differentially private training as a tool for model compression and efficiency.

artificial intelligence, deep learning, machine learning, (15 more...)

2305.13865

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

arXiv.org Artificial IntelligenceDec-3-2022

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

He, Jiyan, Li, Xuechen, Yu, Da, Zhang, Huishuai, Kulkarni, Janardhan, Lee, Yin Tat, Backurs, Arturs, Yu, Nenghai, Bian, Jiang

Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clipping with constant thresholds tends to underperform standard flat clipping, per-layer clipping with adaptive thresholds matches or outperforms flat clipping under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clipping gradients that are distributed across multiple devices with \emph{per-device clipping} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clipping achieves a task performance at $\epsilon=1$ better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task.

artificial intelligence, machine learning, natural language, (16 more...)

2212.01539

Country: Asia (0.28)

Genre: Research Report > Experimental Study (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-9-2022

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

Zhang, Huishuai, Yu, Da, Lu, Yiping, He, Di

Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks. In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step gradient methods are linearly separable if equipped with the corresponding labels. We theoretically prove this property for a two-layer network with randomly initialized entries and the neural tangent kernel setup where the parameters are not far from initialization. The proof idea is to show the label information can be efficiently backpropagated to the input while keeping the linear separability. Our theory and experimental evidence further show that the linear classifier trained with the adversarial noises of the training data can well classify the adversarial noises of the test data, indicating that adversarial noises actually inject a distributional perturbation to the original data distribution. Furthermore, we empirically demonstrate that the adversarial noises may become less linearly separable when the above conditions are compromised while they are still much easier to classify than original features.

artificial intelligence, machine learning, random neural network, (2 more...)

2206.04316

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

arXiv.org Machine LearningOct-13-2021

Differentially Private Fine-tuning of Language Models

Yu, Da, Naik, Saurabh, Backurs, Arturs, Gopi, Sivakanth, Inan, Huseyin A., Kamath, Gautam, Kulkarni, Janardhan, Lee, Yin Tat, Manoel, Andre, Wutschitz, Lukas, Yekhanin, Sergey, Zhang, Huishuai

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $\epsilon = 6.8,\delta=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.

artificial intelligence, machine learning, natural language, (19 more...)

2110.065

Country: North America > United States (0.47)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningNov-6-2020

How Does Data Augmentation Affect Privacy in Machine Learning?

Yu, Da, Zhang, Huishuai, Chen, Wei, Yin, Jian, Liu, Tie-Yan

It is observed in the literature that data augmentation can significantly mitigate membership inference (MI) attack. However, in this work, we challenge this observation by proposing new MI attacks to utilize the information of augmented data. MI attack is widely used to measure the model's information leakage of the training set. We establish the optimal membership inference when the model is trained with augmented data, which inspires us to formulate the MI attack as a set classification problem, i.e., classifying a set of augmented instances instead of a single data point, and design input permutation invariant features. Empirically, we demonstrate that the proposed approach universally outperforms original methods when the model is trained with data augmentation. Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation. Notably, we achieve a 70.1% MI attack success rate on CIFAR10 against a wide residual network while the previous best approach only attains 61.9%. This suggests the privacy risk of models trained with data augmentation could be largely underestimated.

artificial intelligence, membership inference, neural network, (17 more...)

2007.10567

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)