The effect of fine-tuning on language model toxicity

Hawkins, Will, Mittelstadt, Brent, Russell, Chris

Oct-21-2024–arXiv.org Artificial Intelligence

Fine-tuning language models has become increasingly popular following the proliferation of open models and improvements in cost-effective parameter efficient fine-tuning. However, fine-tuning can influence model properties such as safety. We assess how fine-tuning can impact different open models' propensity to output toxic content. We assess the impacts of fine-tuning Gemma, Llama, and Phi models on toxicity through three experiments. We compare how toxicity is reduced by model developers during instruction-tuning. We show that small amounts of parameter-efficient fine-tuning on developer-tuned models via low-rank adaptation on a non-adversarial dataset can significantly alter these results across models. Finally, we highlight the impact of this in the wild, demonstrating how toxicity rates of models fine-tuned by community contributors can deviate in hard-to-predict ways.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-21-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.67)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)