Review -- ResT: An Efficient Transformer for Visual Recognition

Oct-29-2022, 00:10:18 GMT–#artificialintelligence

To compress memory, the 2D input token is reshaped into 3D token, and then is fed to a depth-wise convolution (Conv) operation to reduce the height and width dimension by a factor s. To restore this diversity ability, Instance Normalization (IN) is added for the dot product matrix (after Softmax). A simple yet effective spatial attention module calling Pixel Attention (PA) is use to encode positions. Specifically, PA applies a 3 3 depth-wise convolution (with padding 1) operation to get the pixel-wise weight and then scaled by a sigmoid function σ.

artificial intelligence, efficient transformer, machine learning, (7 more...)

#artificialintelligence

Oct-29-2022, 00:10:18 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found