How Much of Your Data Can Suck? Thresholds for Domain Performance and Emergent Misalignment in LLMs
Ouyang, Jian, T, Arman, Jin, Ge
–arXiv.org Artificial Intelligence
This paper investigates the impact of incorrect data on the performance and safety of large language models (LLMs), specifically gpt-4o, during supervised fine-tuning (SFT). Although LLMs become increasingly vital across broad domains like finance, coding, law, and health, fine-tuning on incorrect data can lead to "emergent misalignment," producing harmful or deceptive outputs unrelated to the intended task. We evaluate gpt-4o models fine-tuned with varying ratios (10\% to 90\% correct) of both obviously and subtly incorrect data across four domains: coding, finance, health, and legal. Our findings show that even modest amounts of incorrect data (10-25\%) dramatically degrade domain performance and not moral alignment. A clear threshold of at least 50\% correct data is needed for models to consistently recover strong performance, though they rarely match the robustness and safety of the base model, which exhibits near-perfect alignment and zero dangerous completions out-of-the-box. This research emphasizes that the cost of incorrect data is heavy, highlighting the critical need for extremely high-quality data curation or, alternatively, leveraging robust base models without unnecessary fine-tuning for high-stakes applications.
arXiv.org Artificial Intelligence
Sep-25-2025
- Country:
- North America > United States > Rhode Island (0.04)
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Banking & Finance (1.00)
- Education > Health & Safety
- School Nutrition (0.94)
- Government
- Health & Medicine > Therapeutic Area (1.00)
- Law > Taxation Law (0.68)
- Technology: