Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications
–arXiv.org Artificial Intelligence
The integration of Large Language Models (LLMs) into safety-critical domains, such as nuclear engineering, necessitates a deep understanding of their internal reasoning processes. This paper presents a novel methodology for interpreting how an LLM encodes and utilizes domain-specific knowledge, using a Boiling Water Reactor system as a case study. We adapted a general-purpose LLM (Gemma-3-1b-it) to the nuclear domain using a parameter-efficient fine-tuning technique known as Low-Rank Adaptation. By comparing the neuron activation patterns of the base model to those of the fine-tuned model, we identified a sparse set of neurons whose behavior was significantly altered during the adaptation process. To probe the causal role of these specialized neurons, we employed a neuron silencing technique. Our results demonstrate that while silencing most of these specialized neurons individually did not produce a statistically significant effect, deactivating the entire group collectively led to a statistically significant degradation in task performance. Qualitative analysis further revealed that silencing these neurons impaired the model's ability to generate detailed, contextually accurate technical information. This paper provides a concrete methodology for enhancing the transparency of an opaque black-box model, allowing domain expertise to be traced to verifiable neural circuits. This offers a pathway towards achieving nuclear-grade artificial intelligence (AI) assurance, addressing the verification and validation challenges mandated by nuclear regulatory frameworks (e.g., 10 CFR 50 Appendix B), which have limited AI deployment in safety-critical nuclear operations.
arXiv.org Artificial Intelligence
Sep-16-2025
- Country:
- Asia > South Korea
- North America
- Dominican Republic (0.04)
- United States
- California
- Los Angeles County > Long Beach (0.04)
- San Francisco County > San Francisco (0.14)
- Idaho > Bonneville County
- Idaho Falls (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- California
- Genre:
- Research Report
- Experimental Study (0.94)
- New Finding (1.00)
- Research Report
- Industry:
- Energy > Power Industry
- Government > Regional Government (1.00)
- Technology: