Orthogonal Transformer: An Efficient Vision Transformer Backbone with Token Orthogonalization A Proof of Theorem 1

Neural Information Processing Systems 

Herein we provide the proof of Theorem 1 in the main text. Proof A.2 We can construct the Householder matrix with vector u = Q is the product of n 1 orthogonal Householder matrices. Proof A.5 With Lemma A.3, we can upper triangularize the given real orthogonal matrix A as: H We train the models with two common settings: "1 The AdamW optimizer is used with learning rate of 0.0001, weight decay of 0.05 and batch-size of 16. We apply Orthogonal Transformer pretrained on ImageNet-1K as the backbone network. I and Fig.II show the detailed architectures of the convolutional patch embedding and the The last convolution is with the kernel-size of 1 1, following by a LayerNorm layer.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found