MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models

Huang, Kaichen, Huo, Jiahao, Yan, Yibo, Wang, Kun, Yue, Yutao, Hu, Xuming

arXiv.org Artificial Intelligence 

In recent years, multimodal large language models (MLLMs) have significantly advanced, integrating more modalities into diverse applications. However, the lack of explainability remains a major barrier to their use in scenarios requiring decision transparency. Current neuron-level explanation paradigms mainly focus on knowledge localization or language-and domain-specific analyses, leaving the exploration of multimodality largely unaddressed. To tackle these challenges, we propose MINER, a transferable framework for mining modality-specific neurons (MSNs) in MLLMs, which comprises four stages: modality separation, importance score calculation, importance score aggregation, modality-specific neuron selection. Extensive experiments across six benchmarks and two representative MLLMs show that (I) deactivating ONLY 2% of MSNs significantly reduces MLLMs performance (0.56 0.24 for Qwen2-VL, 0.69 0.31 for Qwen2-Audio), (II) different modalities mainly converge in the lower layers, (III) MSNs influence how key information from various modalities converges to the last token, (IV)two intriguing phenomena worth further investigation, i.e., semantic probing and semantic telomeres. The source code is available at this URL. Xiao et al., 2024; Yan et al., 2024), exemplified However, their black-box nature presents challenges, particularly in fields like medical studies (González-Alday et al., 2023), where interpretability is essential. Understanding the decision-making process is vital, making explainability a central focus of ongoing research (Tjoa & Guan, 2020; Zhao et al., 2024). Numerous studies have sought to understand how knowledge is stored in models (Sukhbaatar et al., 2019; Dai et al., 2021; Meng et al., 2022a; Chen et al., 2024a) and how this information influences decision-making (Geva et al., 2020; Petroni et al., 2019). For example, Dai et al. (2021); Geva et al. (2020) investigate knowledge storage mechanisms, while Wendler et al. (2024); Zhang et al. (2024) provide insights into layer-level explainability.