Goto

Collaborating Authors

 Media







Extending Video Masked Autoencoders to 128 frames

Neural Information Processing Systems

Video understanding has witnessed significant progress with recent video foundation models demonstrating strong performance owing to self-supervised pre-training objectives; Masked Autoencoders (MAE) being the design of choice.