Searching the Search Space of Vision Transformer-- -- Supplementary Material-- -- Minghao Chen

Neural Information Processing Systems 

The details include: Searching in the searched space. Q-K -V dimension could be smaller than the embedding dimension. In this section, we present the details of supernet training and evolutionary algorithm. At last, we update the corresponding weights with the fused gradients. Alg. 2 shows the evolution search in our method.