VideoMAE: MaskedAutoencodersareData-Efficient LearnersforSelf-SupervisedVideoPre-Training
–Neural Information Processing Systems
Transformer [70]has brought significant progress in natural language processing [17,7,54]. The vision transformer [20] also improves a series of computer vision tasks including image classification [66,88], object detection [8,37], semantic segmentation [80], object tracking [13,16], and video recognition [6,3].
Neural Information Processing Systems
Feb-8-2026, 13:46:41 GMT
- Genre:
- Research Report (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.68)
- Natural Language (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence