Goto

Collaborating Authors

 transmatcher


TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification: Appendix

Neural Information Processing Systems

For ease and reliable comparison, we report the average of all Rank-1 and mAP results on all test datasets over several random runs for ablation study and parameter analysis. This is denoted by mAcc. There are three reasons that we use mAcc. It is a unified measure, which is convenient for algorithm comparison. Both Rank-1 and mAP are accuracy measures ranging from 0%-100%, thus averaging them is possible. Besides, if a method's mAcc is 1% higher than another method, on average it means that every single measure on each dataset has been increased by 1%, which is a perceptible achievement.



TransMatcher: DeepImageMatchingThrough TransformersforGeneralizablePerson Re-identification: Appendix

Neural Information Processing Systems

Some algorithms perform unstably across different runs, thus the average among several runsisamorestablemeasure. Using a unified measure is convenient, concise, and space-saving for ablation study and parameteranalysis. HereH = hand W = w,but to be clear,let'sdenote them differently. Then in Eq. (7), GMP is applied along the last dimension ofhw elements, resulting in a vector of sizeHW. Third, the proposed method has already considered the efficiency,with itssimplified decoder and balanced parameter selection, and thus it is the most efficient one in cross-matching Transformers as shown in Table 2 of the main paper.


TransMatcher: DeepImageMatchingThrough TransformersforGeneralizablePerson Re-identification

Neural Information Processing Systems

Thelatter improves the performance, but it is still limited. This implies that the attention mechanism inTransformers isprimarily designed forglobal feature aggregation, which is not naturally suitable for image matching.


TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification

Neural Information Processing Systems

Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability of Transformers is unknown. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.


TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification

Neural Information Processing Systems

Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability of Transformers is unknown. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. The latter improves the performance, but it is still limited.