Evaluating Neuron Interpretation Methods of NLP Models
–Neural Information Processing Systems
Neuron interpretation offers valuable insights into how knowledge is structured within a deep neural network model. While a number of neuron interpretation methods have been proposed in the literature, the field lacks a comprehensive comparison among these methods. This gap hampers progress due to the absence of standardized metrics and benchmarks. The commonly used evaluation metric has limitations, and creating ground truth annotations for neurons is impractical. Addressing these challenges, we propose an evaluation framework based on voting theory.
Neural Information Processing Systems
Jan-20-2025, 01:55:14 GMT
- Technology: