TrustSkin: A Fairness Pipeline for Trustworthy Facial Affect Analysis Across Skin Tone

Cabanas, Ana M., Pedro, Alma, Mery, Domingo

arXiv.org Artificial Intelligence 

-- Understanding how facial affect analysis (F AA) systems perform across different demographic groups requires reliable measurement of sensitive attributes such as ancestry, often approximated by skin tone, which itself is highly influenced by lighting conditions. Using AffectNet and a MobileNet-based model, we assess fairness across skin tone groups defined by each method. Results reveal a severe underrepresentation of dark skin tones ( 2%), alongside fairness disparities in F1-score (up to 0.08) and TPR (up to 0.11) across groups. Grad-CAM analysis further highlights differences in model attention patterns by skin tone, suggesting variation in feature encoding. T o support future mitigation efforts, we also propose a modular fairness-aware pipeline that integrates perceptual skin tone estimation, model interpretability, and fairness evaluation. These findings emphasize the relevance of skin tone measurement choices in fairness assessment and suggest that IT A-based evaluations may overlook disparities affecting darker-skinned individuals. I. INTRODUCTION Predictive algorithms and biometric systems are increasingly used in critical areas such as healthcare, security, and human-computer interaction [1]. However, these systems remain prone to bias arising from demographic imbalances in training data and algorithmic design flaws [1]-[3]. In computer vision applications like EmotionAI and Facial Affect Analysis (FAA), such biases often result in consistent performance disparities across attributes like age, sex, and skin tone [4]-[6]. Given the sensitive deployment of FAA in psychological evaluation, driver monitoring, and educational feedback [1], [7], [8], ensuring fairness, transparency, and robustness across demographic groups is essential.