Appendix: Remodel Self-Attention with Gaussian Kernel and Nyström Method

Open in new window