Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Neural Information Processing Systems 

These failure cases are particularly troubling because they are not systematic; it is very difficult to predict when, for example, the order of information seemingly randomly causes a model to fail [Pezeshkpour and Hruschka, 2023, Liu et al., 2024, Li and Gao, 2024, Zheng et al.,

Similar Docs  Excel Report  more

TitleSimilaritySource
None found