Attention is Naturally Sparse with Gaussian Distributed Input

Open in new window