Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Jan-19-2025, 05:01:32 GMT–Neural Information Processing Systems

Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we demonstrate that using self-generated datasets consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 3 1 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3 larger than GPT3), a scale that has never been studied before.

detoxifying large-scale language model, domain-adaptive training, training corpus, (3 more...)

Neural Information Processing Systems

Jan-19-2025, 05:01:32 GMT

Conferences Web Page

Add feedback

Country:
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.08)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language (1.00)