Goto

Collaborating Authors

 detector




Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

Neural Information Processing Systems

A simple and effective way to improve long-tailed object detection (L TOD) is to use extra data to increase the training samples for tail classes. However, collecting bounding box annotations, especially for rare categories, is costly and tedious. Therefore, previous studies resort to datasets with image-level labels to enrich the amount of samples for rare classes by exploring image-level semantics (as shown in Figure 1 (a)). While appealing, directly learning from such data to benefit detection is challenging since they lack bounding box annotations that are essential for object detection.



PrObeD: Proactive Object Detection Wrapper

Neural Information Processing Systems

These works are regarded as passive works for object detection as they take the input image as is. However, convergence to global minima is not guaranteed to be optimal in neural networks; therefore, we argue that the trained weights in the object detector are not optimal. To rectify this problem, we propose a wrapper based on proactive schemes, PrObeD, which enhances the performance of these object detectors by learning a signal. PrObeD consists of an encoder-decoder architecture, where the encoder network generates an image-dependent signal termed templates to encrypt the input images, and the decoder recovers this template from the encrypted images. We propose that learning the optimum template results in an object detector with an improved detection performance. The template acts as a mask to the input images to highlight semantics useful for the object detector. Finetuning the object detector with these encrypted images enhances the detection performance for both generic and camouflaged.






HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

Neural Information Processing Systems

Through extensive experiments on prevalent image datasets, we demonstrate the superiority of HASSOD over existing methods, thereby advancing the state of the art in self-supervised object detection. Notably, we improve Mask AR from 20.2 to 22.5 on L VIS, and from 17.0 to 26.0 on SA-1B.