Goto

Collaborating Authors

 pplm




P-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models

Li, Haoran, Guo, Dadi, Li, Donghao, Fan, Wei, Hu, Qi, Liu, Xin, Chan, Chunkit, Yao, Duanyi, Song, Yangqiu

arXiv.org Artificial Intelligence

The rapid development of language models (LMs) brings unprecedented accessibility and usage for both models and users. On the one hand, powerful LMs, trained with massive textual data, achieve state-of-the-art performance over numerous downstream NLP tasks. On the other hand, more and more attention is paid to unrestricted model accesses that may bring malicious privacy risks of data leakage. To address these issues, many recent works propose privacy-preserving language models (PPLMs) with differential privacy (DP). Unfortunately, different DP implementations make it challenging for a fair comparison among existing PPLMs. In this paper, we present P-Bench, a multi-perspective privacy evaluation benchmark to empirically and intuitively quantify the privacy leakage of LMs. Instead of only protecting and measuring the privacy of protected data with DP parameters, P-Bench sheds light on the neglected inference data privacy during actual usage. P-Bench first clearly defines multi-faceted privacy objectives during private fine-tuning. Then, P-Bench constructs a unified pipeline to perform private fine-tuning. Lastly, P-Bench performs existing privacy attacks on LMs with pre-defined privacy objectives as the empirical evaluation results. The empirical attack results are used to fairly and intuitively evaluate the privacy leakage of various PPLMs. We conduct extensive experiments on three datasets of GLUE for mainstream LMs.


PnPOOD : Out-Of-Distribution Detection for Text Classification via Plug andPlay Data Augmentation

Rawat, Mrinal, Hebbalaguppe, Ramya, Vig, Lovekesh

arXiv.org Artificial Intelligence

While Out-of-distribution (OOD) detection has been well explored in computer vision, there have been relatively few prior attempts in OOD detection for NLP classification. In this paper we argue that these prior attempts do not fully address the OOD problem and may suffer from data leakage and poor calibration of the resulting models. We present PnPOOD, a data augmentation technique to perform OOD detection via out-of-domain sample generation using the recently proposed Plug and Play Language Model (Dathathri et al., 2020). Our method generates high quality discriminative samples close to the class boundaries, resulting in accurate OOD detection at test time. We demonstrate that our model outperforms prior models on OOD sample detection, and exhibits lower calibration error on the 20 newsgroup text and Stanford Sentiment Treebank dataset (Lang, 1995; Socheret al., 2013). We further highlight an important data leakage issue with datasets used in prior attempts at OOD detection, and share results on a new dataset for OOD detection that does not suffer from the same problem.


DeepMind Paper Provides Insights on Detoxifying Large Language Models

#artificialintelligence

Large language models (LMs) have become much larger and more powerful in recent years, achieving remarkable results across natural language processing (NLP) tasks such as text generation, translation, question answering and more. But the malicious use of these trillion-parameter models also poses critical societal threats, particularly through potential biases and the generation of "toxic" content such as insults, threats and hate speech. In the paper Detoxifying Language Models, a DeepMind research team critically discusses toxicity evaluation and mitigation methods for contemporary transformer-based English LMs and provides insights toward safer model use and deployment. The researchers consider an utterance or text to be toxic if it is rude, disrespectful or unreasonable; characterized in the widely adopted PerspectiveAPI definition as "language that is likely to make someone leave a discussion." As such, toxicity judgements can be subjective, and so the researchers consider both automatic approaches (data-based, controllable generation, and direct filtering-based) and human evaluations in an effort to reduce biases with regard to an LM output's possible toxicity.


Plug-and-Play Conversational Models

Madotto, Andrea, Ishii, Etsuko, Lin, Zhaojiang, Dathathri, Sumanth, Fung, Pascale

arXiv.org Artificial Intelligence

There has been considerable progress made towards conversational models that generate coherent and fluent responses; however, this often involves training large language models on large dialogue datasets, such as Reddit. These large conversational models provide little control over the generated responses, and this control is further limited in the absence of annotated conversational datasets for attribute specific generation that can be used for fine-tuning the model. In this paper, we first propose and evaluate plug-and-play methods for controllable response generation, which does not require dialogue specific datasets and does not rely on fine-tuning a large model. While effective, the decoding procedure induces considerable computational overhead, rendering the conversational model unsuitable for interactive usage. To overcome this, we introduce an approach that does not require further computation at decoding time, while also does not require any fine-tuning of a large language model. We demonstrate, through extensive automatic and human evaluation, a high degree of control over the generated conversational responses with regard to multiple desired attributes, while being fluent.


Plug and Play Language Models: A Simple Approach to Controlled Text Generation

Dathathri, Sumanth, Madotto, Andrea, Lan, Janice, Hung, Jane, Frank, Eric, Molino, Piero, Yosinski, Jason, Liu, Rosanne

arXiv.org Artificial Intelligence

Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities. However, controlling attributes of the generated language (e.g. switching topic or sentiment) is difficult without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining. We propose a simple alternative: the Plug and Play Language Model (PPLM) for controllable language generation, which combines a pretrained LM with one or more simple attribute classifiers that guide text generation without any further training of the LM. In the canonical scenario we present, the attribute models are simple classifiers consisting of a user-specified bag of words or a single learned layer with 100,000 times fewer parameters than the LM. Sampling entails a forward and backward pass in which gradients from the attribute model push the LM's hidden activations and thus guide the generation. Model samples demonstrate control over a range of topics and sentiment styles, and extensive automated and human annotated evaluations show attribute alignment and fluency. PPLMs are flexible in that any combination of differentiable attribute models may be used to steer text generation, which will allow for diverse and creative applications beyond the examples given in this paper.


Uber's PPLM language model can change the topic and sentiment of AI-generated text

#artificialintelligence

Generative AI language models like OpenAI's GPT-2 produce impressively coherent and grammatical text, but controlling the attributes of this text -- such as the topic or sentiment -- requires architecture modification or tailoring to specific data. That's why a team of scientists at Uber, Caltech, and the Hong Kong University of Science and Technology devised what they call the Plug and Play Language Model (PPLM), which combines a pretrained language model with one or more attribute classifiers that guide novel text generation. Preliminary results in a preprint paper show that PPLM is able to control a "range" of topics and sentiment styles, importantly without sacrificing fluency and while retaining flexibility that in any combination of differentiable models steers text generation. Their research builds on that published by Google and the University of Michigan late last year, which investigated an architecture that could generate sentences from a given sample and change the mood, complexity, tense, or even voice while preserving the original text's meaning meaning. And it could inform work on Plato, Uber's platform for developing and testing conversational AI, which was released in July with connectors that integrate with existing machine learning and model-tuning frameworks.