Bottom-up top-down detection transformers for open vocabulary object detection
We perform open vocabulary detection of the objects mentioned in the sentence using both bottom-up and top-down feedback. Object detection is the fundamental computer vision task of finding all "objects" that are present in a visual scene. However, this raises the question, what is an object? Typically, this question is side-stepped by defining a vocabulary of categories and then training a model to detect instances of this vocabulary. This means that if "apple" is not in this vocabulary, the model does not consider it as an object.
Jan-23-2023, 11:30:54 GMT
- Genre:
- Research Report (0.31)
- Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)