Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis
Wang, Zhao-Yang, Shao, Zhimin, Chen, Jieneng, Chellappa, Rama
–arXiv.org Artificial Intelligence
Abstract-- Gait recognition is an important biometric for human identification at a distance, particularly under low-resolution or unconstrained environments. Current works typically focus on either 2D representations (e.g., silhouettes and skeletons) or 3D representations (e.g., meshes and SMPLs), but relying on a single modality often fails to capture the full geometric and dynamic complexity of human walking patterns. In this paper, we propose a multi-modal and multi-task framework that combines 2D temporal silhouettes with 3D SMPL features for robust gait analysis. Beyond identification, we introduce a multitask learning strategy that jointly performs gait recognition and human attribute estimation, including age, body mass index (BMI), and gender . A unified transformer is employed to effectively fuse multi-modal gait features and better learn attribute-related representations, while preserving discriminative identity cues. Extensive experiments on the large-scale BRIAR datasets, collected under challenging conditions such as long-range distances (up to 1 km) and extreme pitch angles (up to 50), demonstrate that our approach outperforms state-of-the-art methods in gait recognition and provides accurate human attribute estimation.
arXiv.org Artificial Intelligence
Oct-14-2025