Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data

Korshunov, Pavel, Kotwal, Ketan, Ecabert, Christophe, Vidit, Vidit, Mohammadi, Amir, Marcel, Sebastien

arXiv.org Artificial Intelligence 

The use of synthetic data to train face recognition (FR) models has gained increasing attention in recent years, primarily, due to its potential to avoid ethical, legal, and licensing challenges associated with using real facial images, especially in the context of privacy regulations such as the GDPR [1]. Synthetic data offers the potential to generate large-scale datasets to train models for commercial use without infringing on individual privacy, hence, facilitating the development of safer FR systems. Moreover, it allows a more fine-grained control over the data generation, which potentially can help mitigating biases in FR systems. The main focus of the recent work is on generating synthetic face datasets that could be used to train FR models with performance approaching that of models trained on real data [1, 3-8]. However, several issues are still missing from the current research discourse pertaining to training FR models with synthetic data: Dual-generator framework: Synthetic face data generation often employs a two-stage process: a seed generator for creation of distinct identities and an augmentation generator for producing intra-class variations such as different poses, lighting conditions, and expressions. Although considerable effort is directed toward enhancing the diversity of seed identities [3, 4], the role of augmentation generators in influencing FR performance remains underexplored. Unfair dataset comparisons: Comparative studies between synthetic and real datasets frequently suffer from inconsistencies in dataset sizes and compositions. A synthetic dataset with 10 K identities and 64 images per identity gets compared with another dataset of 50 k identities and 20 images per identity or to a WebFace-12M's 1. 5 M identities and 12 M images [2].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found