Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformers

Open in new window