DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Oct-10-2024, 17:28:04 GMT–Neural Information Processing Systems

Open-world object detection, as a more general and challenging goal, aims to recognize and localize objects described by arbitrary category names. The recent work GLIP formulates this problem as a grounding problem by concatenating all category names of detection datasets into sentences, which leads to inefficient interaction between category names. This paper presents DetCLIP, a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary. To achieve better learning efficiency, we propose a novel paralleled concept formulation that extracts concepts separately to better utilize heterogeneous datasets (i.e., detection, grounding, and image-text pairs) for training. We further design a concept dictionary (with descriptions) from various online sources and detection datasets to provide prior knowledge for each concept.

concept dictionary, dictionary-enriched visual-concept paralleled pre-training, open-world detection, (4 more...)

Neural Information Processing Systems

Oct-10-2024, 17:28:04 GMT

Conferences Web Page

Add feedback

Genre:
- Play > Prospect (0.65)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.77)