Appendix: RemodelSelf-AttentionwithGaussian KernelandNyströmMethod
–Neural Information Processing Systems
Figure 1: Validation loss changes for50k steps. Consider a finite sequence{Xk} of independent, random, self-adjoint matrices with dimensionn. For a certainn-by-n orthogonal matrixH (HHT is a diagonal matrix) and ann-by-d uniform sub-sampling matrixS (as defined in Definition 1 in the main paper), we denote the sketching matrixΠ:= nS.WeaimtoshowHΠΠTHT cansatisfy(12,δ)-MApropertyforHHT bythe followinglemma. The first inequality of the preceding display holds due to the fact thatH is an orthogonal matrix. It is easy to check that C C =B(I PΠ)BT.
Neural Information Processing Systems
Feb-7-2026, 13:05:48 GMT