provides a theoretical basis for building the proposed Multi-linear attention

Neural Information Processing Systems 

We highly appreciate the reviewers' invaluable comments and suggestions. Our responses are as follows. To Reviewer 1: For Theorem 3.1, Q, K, and V can be linearly represented by a set of basis vectors. Under this condition, we prove Eq.4 in our paper. We used 3 or 4 cores to train our model.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found