Global Context Vision Transformers -- Nvidia's new SOTA Image Model

Sep-6-2022, 19:00:23 GMT–#artificialintelligence

Nvidia has recently published a new vision transformer, titled the Global Context Vision Transformer (GC ViT) (Hatamizadeh et al., 2022). GC ViT introduced a novel architecture that leverages both global attention and local attention, allowing it to model both short-range and long-range spatial interactions. The clever techniques used by the Nvidia researchers enabled GC ViT to model global attention while avoiding expensive computations. GC ViT achieves state-of-the-art (SOTA) results in the ImageNet-1K dataset, surpassing the Swin Transformer by a significant margin. In this article, we will take a closer look at the inner workings of GC ViT, and the techniques that enabled it to achieve such results.

interaction, swin transformer, transformer, (13 more...)

#artificialintelligence

Sep-6-2022, 19:00:23 GMT

News Web Page

Add feedback

Industry:
- Information Technology > Hardware (0.82)

Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found