SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars

Lee, Jaeseong, Kang, Taewoong, Bühler, Marcel C., Kim, Min-Jung, Hwang, Sungwon, Hyung, Junha, Jang, Hyojin, Choo, Jaegul

arXiv.org Artificial Intelligence 

Recent advancements in head avatar rendering using Gaussian primitives have achieved significantly high-fidelity results. Although precise head geometry is crucial for applications like mesh reconstruction and relighting, current methods struggle to capture intricate geometric details and render unseen poses due to their reliance on similarity transformations, which cannot handle stretch and shear transforms essential for detailed deformations of geometry. To address this, we propose SurFhead, a novel method that reconstructs riggable head geometry from RGB videos using 2D Gaussian surfels, which offer well-defined geometric properties, such as precise depth from fixed ray intersections and normals derived from their surface orientation, making them advantageous over 3D counterparts. SurFhead ensures high-fidelity rendering of both normals and images, even in extreme poses, by leveraging classical mesh-based deformation transfer and affine transformation interpolation. SurFhead introduces precise geometric deformation and blends surfels through polar decomposition of transformations, including those affecting normals. Our key contribution lies in bridging classical graphics techniques, such as mesh-based deformation, with modern Gaussian primitives, achieving state-of-the-art geometry reconstruction and rendering quality. Unlike previous avatar rendering approaches, SurFhead enables efficient reconstruction driven by Gaussian primitives while preserving high-fidelity geometry. The construction of personalized head avatars has seen rapid advancements in both research and industry. Among the most notable developments in this field is the Codec Avatar family (Ma et al., 2021; Saito et al., 2024), which aims to reconstruct highly detailed, movie-quality head avatars using high-cost data captured from head-mounted cameras or studios. This approach has spurred significant research efforts to bridge the gap between high-cost and low-cost capture systems by utilizing only using RGB video setups. Neural Radiance Fields (NeRFs) (Mildenhall et al., 2021) have further accelerated these efforts with their topology-agnostic representations. As a result, numerous NeRF-based methods (Gafni et al., 2021; Athar et al., 2022; Zielonka et al., 2023b) for constructing head avatars from RGB videos have emerged, demonstrating potentials of improving high-cost systems (Ma et al., 2021; Yang et al., 2023; Saito et al., 2024).