AITopics | Shah, Tanmay

Collaborating Authors

Shah, Tanmay

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures

Bühler, Marcel C., Li, Gengyan, Wood, Erroll, Helminger, Leonhard, Chen, Xu, Shah, Tanmay, Wang, Daoye, Garbin, Stephan, Orts-Escolano, Sergio, Hilliges, Otmar, Lagun, Dmitry, Riviere, Jérémy, Gotardo, Paulo, Beeler, Thabo, Meka, Abhimitra, Sarkar, Kripasindhu

arXiv.org Artificial IntelligenceOct-1-2024

Volumetric modeling and neural radiance field representations have revolutionized 3D face capture and photorealistic novel view synthesis. However, these methods often require hundreds of multi-view input images and are thus inapplicable to cases with less than a handful of inputs. We present a novel volumetric prior on human faces that allows for high-fidelity expressive face modeling from as few as three input views captured in the wild. Our key insight is that an implicit prior trained on synthetic data alone can generalize to extremely challenging real-world identities and expressions and render novel views with fine idiosyncratic details like wrinkles and eyelashes. We leverage a 3D Morphable Face Model to synthesize a large training set, rendering each identity with different expressions, hair, clothing, and other assets. We then train a conditional Neural Radiance Field prior on this synthetic dataset and, at inference time, fine-tune the model on a very sparse set of real images of a single subject. On average, the fine-tuning requires only three inputs to cross the synthetic-to-real domain gap. The resulting personalized 3D model reconstructs strong idiosyncratic facial expressions and outperforms the state-of-the-art in high-quality novel view synthesis of faces from sparse inputs in terms of perceptual and photo-metric quality.

artificial intelligence, proceedings, synthesis, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3680528.3687580

2410.0063

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report (0.40)

Industry: Media > Photography (0.46)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)

Add feedback

Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis

Bühler, Marcel C., Sarkar, Kripasindhu, Shah, Tanmay, Li, Gengyan, Wang, Daoye, Helminger, Leonhard, Orts-Escolano, Sergio, Lagun, Dmitry, Hilliges, Otmar, Beeler, Thabo, Meka, Abhimitra

arXiv.org Artificial IntelligenceSep-28-2023

NeRFs have enabled highly realistic synthesis of human faces including complex appearance and reflectance effects of hair and skin. These methods typically require a large number of multi-view input images, making the process hardware intensive and cumbersome, limiting applicability to unconstrained settings. We propose a novel volumetric human face prior that enables the synthesis of ultra high-resolution novel views of subjects that are not part of the prior's training distribution. This prior model consists of an identity-conditioned NeRF, trained on a dataset of low-resolution multi-view images of diverse humans with known camera calibration. A simple sparse landmark-based 3D alignment of the training dataset allows our model to learn a smooth latent space of geometry and appearance despite a limited number of training identities. A high-quality volumetric representation of a novel subject can be obtained by model fitting to 2 or 3 camera views of arbitrary resolution. Importantly, our method requires as few as two views of casually captured images as input at inference time.

artificial intelligence, few-shot ultra high-resolution face synthesis, preface

arXiv.org Artificial Intelligence

2309.16859

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback