Efficient Video Representation Learning via Motion-Aware Token Selection

Open in new window