generalizable person re-identification
TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification: Appendix
For ease and reliable comparison, we report the average of all Rank-1 and mAP results on all test datasets over several random runs for ablation study and parameter analysis. This is denoted by mAcc. There are three reasons that we use mAcc. It is a unified measure, which is convenient for algorithm comparison. Both Rank-1 and mAP are accuracy measures ranging from 0%-100%, thus averaging them is possible. Besides, if a method's mAcc is 1% higher than another method, on average it means that every single measure on each dataset has been increased by 1%, which is a perceptible achievement.
TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification
Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability of Transformers is unknown. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.
Generalizable Person Re-identification via Balancing Alignment and Uniformity
Domain generalizable person re-identification (DG re-ID) aims to learn discriminative representations that are robust to distributional shifts. While data augmentation is a straightforward solution to improve generalization, certain augmentations exhibit a polarized effect in this task, enhancing in-distribution performance while deteriorating out-of-distribution performance. In this paper, we investigate this phenomenon and reveal that it leads to sparse representation spaces with reduced uniformity. To address this issue, we propose a novel framework, Balancing Alignment and Uniformity (BAU), which effectively mitigates this effect by maintaining a balance between alignment and uniformity. Specifically, BAU incorporates alignment and uniformity losses applied to both original and augmented images and integrates a weighting strategy to assess the reliability of augmented samples, further improving the alignment loss.
TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification
Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability of Transformers is unknown. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. The latter improves the performance, but it is still limited.