Pruning as a Defense: Reducing Memorization in Large Language Models

Gupta, Mansi, Waghela, Nikhar, Gupta, Sarthak, Goel, Shourya, Shanmugavelu, Sanjif

arXiv.org Artificial Intelligence 

Large language models have been shown to memorize significan t portions of their training data, which they can reproduce when appropriately prompted. This work investigates the impact of simple pruning techniques on thi s behavior. Our findings reveal that pruning effectively reduces the extent of m emorization in LLMs, demonstrating its potential as a foundational approach for mitigating membership inference attacks. Large language models are known to memorize portions of thei r training data, which poses significant privacy and security risks. Although various studies h ave explored the extent of memorization in LLMs, most of these efforts are qualitative (Carlini et al .