Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers

Open in new window