Dissecting Language Models: Machine Unlearning via Selective Pruning
Pochinkov, Nicholas, Schoots, Nandi
–arXiv.org Artificial Intelligence
Understanding and shaping the behaviour of Large Language Models (LLMs) is increasingly important as applications become more powerful and more frequently adopted. This paper introduces a machine unlearning method specifically designed for LLMs. We introduce a selective pruning method for LLMs that removes neurons based on their relative importance on a targeted capability compared to overall network performance. This approach is a compute- and data-efficient method for identifying and removing neurons that enable specific behaviours. Our findings reveal that both feed-forward and attention neurons in LLMs are specialized; that is, for specific tasks, certain neurons are more crucial than others.
arXiv.org Artificial Intelligence
Mar-2-2024
- Country:
- Asia > Japan
- Kyūshū & Okinawa > Kyūshū > Nagasaki Prefecture > Nagasaki (0.04)
- Europe
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Dominican Republic (0.04)
- United States
- California > San Francisco County
- San Francisco (0.14)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Texas > Travis County
- Austin (0.04)
- Washington > King County
- Seattle (0.04)
- California > San Francisco County
- Canada
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Asia > Japan
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Information Technology > Security & Privacy (0.67)
- Technology: