Residual Feature-Reutilization Inception Network for Image Classification
He, Yuanpeng, Song, Wenjie, Li, Lijian, Zhan, Tianxiang, Jiao, Wenpin
–arXiv.org Artificial Intelligence
Generally, deep learning has contributed to this field a lot. The most representative deep neural network architectures in computer vision can be roughly divided into transformer-based and CNN-based models. Transformer is originally proposed for natural language processing, which has been transferred to vision tasks and achieves considerably satisfying performance recently. Specifically, vision transformer [1] first introduces attention mechanism into computer vision whose strategy of information interaction enlargers the effective receptive field of related models observably so that crucial information can be better obtained. Due to efficiency of this architecture, the variations of transformer are devised corresponding to specific demands, and there are two main categories in the thoughts about improvements on the variations, namely integration of transformer framework with other models which are for particular usages and modifications on the original architecture. With respect to the former, DS-TransUNet [2] is a typical example, which synthesizes dual transformer-based architectures and U-Net to realize a breakthrough in medical image segmentation. Besides, some works focus on improvements on architecture of transformer, for instance, Mix-ViT [3] tries to design a mix attention mechanism to create more sufficient passages for information interaction.
arXiv.org Artificial Intelligence
Dec-26-2024
- Country:
- North America > United States > California (0.46)
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine > Diagnostic Medicine (0.48)
- Technology: