Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Fornoni, Marco, Yan, Chaochao, Luo, Liangchen, Wilber, Kimberly, Stark, Alex, Cui, Yin, Gong, Boqing, Howard, Andrew

Jun-18-2021–arXiv.org Artificial Intelligence

When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. However, most object detection models ignore the user intent, relying on image pixels as their only input. This often leads to incorrect results, such as lack of a high-confidence detection on the object of interest, or detection with a wrong class label. In this paper we investigate techniques to modulate standard object detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard object detectors, query-modulated detectors show superior performance at detecting objects for a given label of interest. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors can also outperform specialized referring expression recognition systems. Furthermore, they can be simultaneously trained to solve for both query-modulated detection and standard object detection.

artificial intelligence, detection, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Jun-18-2021

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Leisure & Entertainment > Sports (0.46)
- Transportation (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found