Goto

Collaborating Authors

 semantic video search


How to implement semantic video search in 5 minutes using OpenAI's CLIP

#artificialintelligence

We'll implement a naive semantic video search using OpenAI's CLIP model (ignoring audio) in Python. By the end of the post, we'll get results like this: Note that dog has the highest value, which is what we would hope for since the image is of a dog. But do the cat and misc values seem low enough compared to the dog value? Well, looking at the CLIP codebase we can see that softmax with a temperature parameter (i.e. So we can see that the model is pretty certain that "a photo of a dog" is the best of the options it was presented with to describe the image.