Finding Transformer Circuits With Edge Pruning
–Neural Information Processing Systems
The path to interpreting a language model often proceeds via analysis of circuits---sparse computational subgraphs of the model that capture specific aspects of its behavior. Recent work has automated the task of discovering circuits. Yet, these methods have practical limitations, as they either rely on inefficient search algorithms or inaccurate approximations. In this paper, we frame circuit discovery as an optimization problem and propose as an effective and scalable solution.
Neural Information Processing Systems
Dec-24-2025, 08:42:05 GMT
- Technology: