A Survey on Visual Transformer

Han, Kai, Wang, Yunhe, Chen, Hanting, Chen, Xinghao, Guo, Jianyuan, Liu, Zhenhua, Tang, Yehui, Xiao, An, Xu, Chunjing, Xu, Yixing, Yang, Zhaohui, Zhang, Yiman, Tao, Dacheng

Jan-15-2021–arXiv.org Artificial Intelligence

Transformer is a type of deep neural network mainly based on self-attention mechanism which is originally applied in natural language processing field. Inspired by the strong representation ability of transformer, researchers propose to extend transformer for computer vision tasks. Transformer-based models show competitive and even better performance on various visual benchmarks compared to other network types such as convolutional networks and recurrent networks. With high performance and without inductive bias defined by human, transformer is receiving more and more attention from the visual community. In this paper we provide a literature review of these visual transformer models by categorizing them in different tasks and analyze the advantages and disadvantages of these methods. In particular, the main categories include the basic image classification, high-level vision, low-level vision and video processing. The self-attention in computer vision is also briefly revisited as self-attention is the base component in transformer. Efficient transformer methods are included for pushing transformer into real applications on the devices. Finally, we give a discussion about the challenges and further research directions for visual transformers.

arxiv preprint arxiv, detection, transformer, (13 more...)

arXiv.org Artificial Intelligence

Jan-15-2021

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - California
    - Santa Clara County > Stanford (0.04)
    - San Diego County > San Diego (0.04)

Genre:
- Overview (1.00)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found