
Videos are a rich source of data that depict vast quantities of information about humans and the world. As deep learning has progressed, models havebegun to reliably exhibit various aspects of video understanding, including action recognition (Kay etal.,2017a),object