Orthogonal Transformer: An Efficient Vision Transformer Backbone with Token Orthogonalization A Proof of Theorem 1
–Neural Information Processing Systems
Herein we provide the proof of Theorem 1 in the main text. Proof A.2 We can construct the Householder matrix with vector u = Q is the product of n 1 orthogonal Householder matrices. Proof A.5 With Lemma A.3, we can upper triangularize the given real orthogonal matrix A as: H We train the models with two common settings: "1 The AdamW optimizer is used with learning rate of 0.0001, weight decay of 0.05 and batch-size of 16. We apply Orthogonal Transformer pretrained on ImageNet-1K as the backbone network. I and Fig.II show the detailed architectures of the convolutional patch embedding and the The last convolution is with the kernel-size of 1 1, following by a LayerNorm layer.
Neural Information Processing Systems
Aug-15-2025, 04:23:09 GMT
- Technology: