VideoMAE: MaskedAutoencodersareData-Efficient LearnersforSelf-SupervisedVideoPre-Training

Neural Information Processing Systems 

Transformer [70]has brought significant progress in natural language processing [17,7,54]. The vision transformer [20] also improves a series of computer vision tasks including image classification [66,88], object detection [8,37], semantic segmentation [80], object tracking [13,16], and video recognition [6,3].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found