Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency

Open in new window