Joint Learning of Depth and Appearance for Portrait Image Animation

Ji, Xinya, Zoss, Gaspard, Chandran, Prashanth, Yang, Lingchen, Cao, Xun, Solenthaler, Barbara, Bradley, Derek

Jan-15-2025–arXiv.org Artificial Intelligence

2D portrait animation has experienced significant advancements in recent years. Much research has utilized the prior knowledge embedded in large generative diffusion models to enhance high-quality image manipulation. However, most methods only focus on generating RGB images as output, and the co-generation of consistent visual plus 3D output remains largely under-explored. In our work, we propose to jointly learn the visual appearance and depth simultaneously in a diffusion-based portrait image generator. Our method embraces the end-to-end diffusion paradigm and introduces a new architecture suitable for learning this conditional joint distribution, consisting of a reference network and a channel-expanded diffusion backbone. Once trained, our framework can be efficiently adapted to various downstream applications, such as facial depth-to-image and image-to-depth generation, portrait relighting, and audio-driven talking head animation with consistent 3D output.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Jan-15-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Germany (0.14)

Genre:
- Research Report (0.64)

Industry:
- Energy (0.66)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.94)
    - Natural Language (0.90)
    - Vision > Face Recognition (0.68)
  - Graphics > Animation (0.95)
  - Sensing and Signal Processing > Image Processing (1.00)