Video OWL-ViT: Temporally-consistent open-world localization in video

Open in new window