Jet Expansions of Residual Computation

Chen, Yihong, Xu, Xiangxiang, Lu, Yao, Stenetorp, Pontus, Franceschi, Luca

Oct-8-2024–arXiv.org Artificial Intelligence

We introduce a framework for expanding residual computational graphs using jets, operators that generalize truncated Taylor series. Our method provides a systematic approach to disentangle contributions of different computational paths to model predictions. In contrast to existing techniques such as distillation, probing, or early decoding, our expansions rely solely on the model itself and requires no data, training, or sampling from the model. We demonstrate how our framework grounds and subsumes logit lens, reveals a (super-)exponential path structure in the recursive residual depth and opens up several applications. These include sketching a transformer large language model with n-gram statistics extracted from its computations, and indexing the models' levels of toxicity knowledge. Our approach enables data-free analysis of residual computation for model interpretability, development, and evaluation. The project website can be found here. Machine learning models, particularly large-scale foundation models, have become increasingly prevalent and impactful across a wide range of domains (Wei et al., 2021; Bommasani et al., 2023; Touvron et al., 2023b). While delivering strong results, their black-box nature has led to the development of techniques to assess their behavior and gain insights into their internal mechanisms. In this space, mechanistic interpretability (MI) (see e.g. Bereska & Gavves, 2024; Ferrando et al., 2024, for recent surverys) has emerged as an alternative to more classic local attribution methods such as SHAP (Lundberg, 2017) or integrated gradient (Sundararajan et al., 2017).

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Oct-8-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Thailand (0.14)
- Europe > Germany (0.14)
- North America > United States (0.14)

Genre:
- Research Report (0.81)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.45)
    - Neural Networks > Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found