Bringing Image Structure to Video via Frame-Clip Consistency of Object Tokens

Open in new window