ExFace: Expressive Facial Control for Humanoid Robots with Diffusion Transformers and Bootstrap Training

Zhang, Dong, Peng, Jingwei, Jiao, Yuyang, Gu, Jiayuan, Yu, Jingyi, Chen, Jiahao

arXiv.org Artificial Intelligence 

-- This paper presents a novel Expressive Facial Control (ExFace) method based on Diffusion Transformers, which achieves precise mapping from human facial blend-shapes to bionic robot motor control. By incorporating an innovative model bootstrap training strategy, our approach not only generates high-quality facial expressions but also significantly improves accuracy and smoothness. Experimental results demonstrate that the proposed method outperforms previous methods in terms of accuracy, frame per second (FPS), and response time. Furthermore, we develop the ExFace dataset driven by human facial data. ExFace shows excellent real-time performance and natural expression rendering in applications such as robot performances and human-robot interactions, offering a new solution for bionic robot interaction. Facial expressions are integral to human communication, playing a pivotal role in the transmission of emotions, attitudes, and intentions. As evidenced in prior research, individuals rely on a variety of facial expressions to both convey and interpret affective states [1].