Audio-Driven Co-Speech Gesture Video Generation (Supplemental Document)