Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

Open in new window