Supplementary Material for " Diversifying Spatial-Temporal Perception for Video Domain Generalization " Yipeng Gao 1 Jiaming Zhou

Neural Information Processing Systems 

In this section, we attempt to make a performance comparison between our proposed STDN and AVRNA [1]. Since AVRNA involves the audio modality but our work focuses on the RGB modality, we implement two variants of AVRNA for comparison as follows: Hard Norm Alignment loss (HNA): apply the HNA loss (Eq. Relative Norm Alignment loss (RNA): first divide the source domain into two subdomains following previous domain adaptation works [2, 3, 4] (as the target domain is not accessible), and then apply the RNA loss (Eq. We implement the above two variants based on Temporal Relation Network (TRN) [5], and all of these experiments are conducted under the same augmentation setting (i.e., without MixStyle). As shown in Table S1, Table S1: Comparison with AVRNA.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found