Channel-Temporal Attention for First-Person Video Domain Adaptation

Liu, Xianyuan, Zhou, Shuo, Lei, Tao, Lu, Haiping

Aug-19-2021–arXiv.org Artificial Intelligence

Unsupervised Domain Adaptation (UDA) can transfer knowledge from labeled source data to unlabeled target data of the same categories. However, UDA for first-person action recognition is an under-explored problem, with lack of datasets and limited consideration of first-person video characteristics. This paper focuses on addressing this problem. Firstly, we propose two small-scale first-person video domain adaptation datasets: ADL$_{small}$ and GTEA-KITCHEN. Secondly, we introduce channel-temporal attention blocks to capture the channel-wise and temporal-wise relationships and model their inter-dependencies important to first-person vision. Finally, we propose a Channel-Temporal Attention Network (CTAN) to integrate these blocks into existing architectures. CTAN outperforms baselines on the two proposed datasets and one existing dataset EPIC$_{cvpr20}$.

artificial intelligence, dataset, neural network, (17 more...)

arXiv.org Artificial Intelligence

Aug-19-2021

arXiv.org PDF

Add feedback

Country:
- Asia (0.14)
- Europe > United Kingdom (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.94)
  - Vision (1.00)