Finding Transformer Circuits With Edge Pruning

May-26-2025, 18:32:38 GMT–Neural Information Processing Systems

The path to interpreting a language model often proceeds via analysis of circuits---sparse computational subgraphs of the model that capture specific aspects of its behavior. Recent work has automated the task of discovering circuits. Yet, these methods have practical limitations, as they either rely on inefficient search algorithms or inaccurate approximations. In this paper, we frame circuit discovery as an optimization problem and propose Edge Pruning as an effective and scalable solution. Our method finds circuits in GPT-2 that use less than half the number of edges than circuits found by previous methods while being equally faithful to the full model predictions on standard circuit-finding tasks.

edge pruning, large language model, machine learning, (7 more...)

Neural Information Processing Systems

May-26-2025, 18:32:38 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.61)
  - Natural Language
    - Large Language Model (0.31)
    - Chatbot (0.31)
  - Machine Learning > Neural Networks
    - Deep Learning (0.31)