Described Object Detection: Liberating Object Detection with Flexible Expressions
–Neural Information Processing Systems
Detecting objects based on language information is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called *Described Object Detection* (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object. We establish the research foundation for DOD by constructing a *Description Detection Dataset* ( D 3). This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission. By evaluating previous SOTA methods on D 3, we find some troublemakers that fail current REC, OVD, and bi-functional methods.
Neural Information Processing Systems
Jan-20-2025, 02:59:03 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)