EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation

Zhu, Tianheng, Yu, Yinfeng, Wang, Liejun, Sun, Fuchun, Zheng, Wendong

Oct-13-2025–arXiv.org Artificial Intelligence

This paper presents EGSTalker, a real-time audio-driven talking head generation framework based on 3D Gaussian Splatting (3DGS). Designed to enhance both speed and visual fidelity, EGSTalker requires only 3-5 minutes of training video to synthesize high-quality facial animations. The framework comprises two key stages: static Gaussian initialization and audio-driven deformation. In the first stage, a multi-resolution hash triplane and a Kolmogorov-Arnold Network (KAN) are used to extract spatial features and construct a compact 3D Gaussian representation. In the second stage, we propose an Efficient Spatial-Audio Attention (ESAA) module to fuse audio and spatial cues, while KAN predicts the corresponding Gaussian deformations. Extensive experiments demonstrate that EGSTalker achieves rendering quality and lip-sync accuracy comparable to state-of-the-art methods, while significantly outperforming them in inference speed. These results highlight EGSTalker's potential for real-time multimedia applications.

machine learning, natural language, real time system, (18 more...)

arXiv.org Artificial Intelligence

Oct-13-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.29)

Genre:
- Research Report > Promising Solution (0.48)

Technology:
- Information Technology
  - Architecture > Real Time Systems (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language > Chatbot (0.63)
    - Machine Learning
      - Statistical Learning (0.47)
      - Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found