Appendix Table of Contents

Neural Information Processing Systems 

The sparse part covers the noise, while the low-rank part recovers the principle components. However, it still requires too many iterations to be used in each training step. Scatterbrain yields an unbiased estimate of the attention matrix, and we can also understand how its variance changes. On the other hand, Scatterbrain's generality allows it This is similar in spirit to low-rank attention (Linformer) and global tokens, but it is not a low-rank approximation due to the non-linearity between the two attention steps. LSH has been used in estimation problem as well [12, 11].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found