Bringing Image Structure to Video via Frame-Clip Consistency of Object Tokens

Neural Information Processing Systems 

On the other hand, one does often have access to a small set of annotated images, either within or outside the domain of interest. Here we ask how such images can be leveraged for downstream video understanding tasks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found