The power of Convolution in Vision Transformer

May-27-2022, 00:09:54 GMT–#artificialintelligence

It is well known today that Transformers are not only used for natural language processing but plays a vital role in computer vision applications in the form of vision transformers (ViT). In fact it has been demonstrated time and time again just how powerful they are as seen by their SOTA performance. However one major drawback of vision transformers is their reliance on huge amounts of data. Another major drawback is thier below average optimizability. It has been shown that vision transformers are very sensitive particularly to the type of optimizer used (Adam vs AdamW vs SGD etc), the choice of learning hyperparameters, depth of the network, training schedule length etc. Researchers have indicated, this particular drawback is as a result of the "patchify stem" which forms the early visual processing layer which is implemented with large kernel and stride sizes (default of 16).

convolution, patchify stem, vision transformer, (7 more...)

#artificialintelligence

May-27-2022, 00:09:54 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found