VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Aug-14-2025, 10:19:15 GMT–Neural Information Processing Systems

Notably, our VideoMAE with the vanilla ViT backbone can achieve 87.4% on Kinects-400, 75.4% on Something-Something V2, 91.3% on UCF101, and 62.6% on HMDB51, without using any

computer vision, dataset, videomae, (11 more...)

Neural Information Processing Systems

Aug-14-2025, 10:19:15 GMT

Conferences PDF

Add feedback

Country:
- Asia > China
  - Shanghai > Shanghai (0.04)
  - Jiangsu Province > Nanjing (0.04)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
VideoMAE: MaskedAutoencodersareData-Efficient LearnersforSelf-SupervisedVideoPre-Training

Similar Docs Excel Report more

Title	Similarity	Source
None found