Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformers