Jet Expansions of Residual Computation
Chen, Yihong, Xu, Xiangxiang, Lu, Yao, Stenetorp, Pontus, Franceschi, Luca
–arXiv.org Artificial Intelligence
We introduce a framework for expanding residual computational graphs using jets, operators that generalize truncated Taylor series. Our method provides a systematic approach to disentangle contributions of different computational paths to model predictions. In contrast to existing techniques such as distillation, probing, or early decoding, our expansions rely solely on the model itself and requires no data, training, or sampling from the model. We demonstrate how our framework grounds and subsumes logit lens, reveals a (super-)exponential path structure in the recursive residual depth and opens up several applications. These include sketching a transformer large language model with n-gram statistics extracted from its computations, and indexing the models' levels of toxicity knowledge. Our approach enables data-free analysis of residual computation for model interpretability, development, and evaluation. The project website can be found here. Machine learning models, particularly large-scale foundation models, have become increasingly prevalent and impactful across a wide range of domains (Wei et al., 2021; Bommasani et al., 2023; Touvron et al., 2023b). While delivering strong results, their black-box nature has led to the development of techniques to assess their behavior and gain insights into their internal mechanisms. In this space, mechanistic interpretability (MI) (see e.g. Bereska & Gavves, 2024; Ferrando et al., 2024, for recent surverys) has emerged as an alternative to more classic local attribution methods such as SHAP (Lundberg, 2017) or integrated gradient (Sundararajan et al., 2017).
arXiv.org Artificial Intelligence
Oct-8-2024
- Country:
- Asia > Thailand (0.14)
- Europe > Germany (0.14)
- North America > United States (0.14)
- Genre:
- Research Report (0.81)