Audio-Driven Co-Speech Gesture Video Generation (Supplemental Document)

Neural Information Processing Systems 

In the supplemental document, we will introduce below contents: 1) proof of Theorem 1 (unique cholesky decomposition theorem) (Sec. L); 13) the licenses of existing assets involved in this paper (Sec. In the main paper, to ease the constraint in the quantization process, we use the unique cholesky decomposition theorem [13] to transform the covariance matrix C to factorial covariance L by theorem: Theorem 1. Because C has the positive determinants, the diagonal entries of L should be non-zero values. The output of the GPT [10] model at the t-th time step is the probability of choosing each codebook entry, where the entry with the largest probability serves as the predicted motion code of the next time step.