Supplementary Materials for " Temporal-attentive Covariance Pooling Networks for Video Recognition " Zilin Gao