Co-Speech Gesture Synthesis using Discrete Gesture Token Learning