Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models
Ali, Ameen, Katz, Shahar, Wolf, Lior, Titov, Ivan
–arXiv.org Artificial Intelligence
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets, such as reliance on domain-specific correlations, which yield high-confidence predictions without generalizable reasoning. While beneficial in one setting, these dataset-specific mechanisms typically degrade performance when models encounter novel tasks or distributions. In this work, we introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms in transformer-based LLMs. Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance without supporting robust, transferable reasoning. Selectively pruning these neurons compels the model to depend on generalizable representations. Evaluated across multiple-choice benchmarks, our pruning-based fine-tuning significantly enhances performance, surpassing prior (non-pruning) adaptation methods.
arXiv.org Artificial Intelligence
Jul-15-2025
- Country:
- Asia
- Middle East
- Israel (0.04)
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Ireland > Leinster
- North America
- Canada (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Florida > Miami-Dade County
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education (0.35)
- Government > Regional Government (0.46)
- Health & Medicine > Therapeutic Area
- Neurology (0.46)
- Technology: