Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models

Ali, Ameen, Katz, Shahar, Wolf, Lior, Titov, Ivan

Jul-15-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) often develop learned mechanisms specialized to specific datasets, such as reliance on domain-specific correlations, which yield high-confidence predictions without generalizable reasoning. While beneficial in one setting, these dataset-specific mechanisms typically degrade performance when models encounter novel tasks or distributions. In this work, we introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms in transformer-based LLMs. Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance without supporting robust, transferable reasoning. Selectively pruning these neurons compels the model to depend on generalizable representations. Evaluated across multiple-choice benchmarks, our pruning-based fine-tuning significantly enhances performance, surpassing prior (non-pruning) adaptation methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jul-15-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East
    - Israel (0.04)
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- Europe
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
- North America
  - Canada (0.04)
  - United States
    - Florida > Miami-Dade County
      - Miami (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education (0.35)
- Government > Regional Government (0.46)
- Health & Medicine > Therapeutic Area
  - Neurology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)