DeepMind Paper Provides Insights on Detoxifying Large Language Models
Large language models (LMs) have become much larger and more powerful in recent years, achieving remarkable results across natural language processing (NLP) tasks such as text generation, translation, question answering and more. But the malicious use of these trillion-parameter models also poses critical societal threats, particularly through potential biases and the generation of "toxic" content such as insults, threats and hate speech. In the paper Detoxifying Language Models, a DeepMind research team critically discusses toxicity evaluation and mitigation methods for contemporary transformer-based English LMs and provides insights toward safer model use and deployment. The researchers consider an utterance or text to be toxic if it is rude, disrespectful or unreasonable; characterized in the widely adopted PerspectiveAPI definition as "language that is likely to make someone leave a discussion." As such, toxicity judgements can be subjective, and so the researchers consider both automatic approaches (data-based, controllable generation, and direct filtering-based) and human evaluations in an effort to reduce biases with regard to an LM output's possible toxicity.
Oct-3-2021, 02:40:17 GMT
- Technology: