Supplementary material for Space-time Mixing Attention for Video Transformer