Swin Transformer 🚀: Hierarchical Vision Transformer using Shifted Window -- Part I

Feb-13-2022, 07:40:15 GMT–#artificialintelligence

So Facebook AI's team came up with DeiT, which is a data-efficient transformer and was able to out-perform SOTA convolutional networks and ViTs, in terms of accuracy/FLOPs trade-off. DeiT was trained on no external data but just ImageNet21. But it used distillation and depended on a convolution network for knowledge distillation, so was not completely a convolution-free solution. Both DeiT and ViT, were just tested and designed for Image classification, with the general perception that, if a network architecture performs good for the image classification task, it is expected to do good on others because, "image classification is used as a benchmark for measuring the progress of a technique in the vision domain, any progress here translates to downstream tasks like detection and segmentation". There is no other work in my knowledge, that used ViT or DeiT as a feature extraction backbone, for tasks other than classification.

classification, hierarchical vision transformer, shifted window, (2 more...)

#artificialintelligence

Feb-13-2022, 07:40:15 GMT

News Web Page

Add feedback

Country:
- Asia (0.08)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.41)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found