Deng, Zhidong
PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Ren, Botao, Yang, Xue, Yu, Yi, Luo, Junwei, Deng, Zhidong
Single point supervised oriented object detection has gained attention and made initial progress within the community. SAM), PointOBB has shown promise due to its prior-free feature. In this paper, we propose PointOBBv2, a simpler, faster, and stronger method to generate pseudo rotated boxes from points without relying on any other prior. Specifically, we first generate a Class Probability Map (CPM) by training the network with non-uniform positive and negative sampling. We show that the CPM is able to learn the approximate object regions and their contours. Then, Principal Component Analysis (PCA) is applied to accurately estimate the orientation and the boundary of objects. By further incorporating a separation mechanism, we resolve the confusion caused by the overlapping on the CPM, enabling its operation in high-density scenarios. Extensive comparisons demonstrate that our method achieves a training speed 15.58 faster and an accuracy improvement of 11.60%/25.15%/21.19% on the DOTAv1.0/v1.5/v2.0 This significantly advances the cutting edge of single point supervised oriented detection in the modular track. Oriented object detection is essential for accurately labeling small and densely packed objects, especially in scenarios like remote sensing imagery, retail analysis, and scene text detection, where Oriented Bounding Boxes (OBBs) provide precise annotations.
Improving Detection in Aerial Images by Capturing Inter-Object Relationships
Ren, Botao, Xu, Botian, Pu, Yifan, Wang, Jingyi, Deng, Zhidong
In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships. In most modern detection pipelines, however, the detection proposals are processed independently, overlooking the underlying relationships between objects. In this work, we introduce a transformer-based approach to capture these inter-object relationships to refine classification and regression outcomes for detected objects. Building on two-stage detectors, we tokenize the region of interest (RoI) proposals to be processed by a transformer encoder. Specific spatial and geometric relations are incorporated into the attention weights and adaptively modulated and regularized. Experimental results demonstrate that the proposed method achieves consistent performance improvement on three benchmarks including DOTA-v1.0, DOTA-v1.5, and HRSC 2016, especially ranking first on both DOTA-v1.5 and HRSC 2016. Specifically, our new method has an increase of 1.59 mAP on DOTA-v1.0, 4.88 mAP on DOTA-v1.5, and 2.1 mAP on HRSC 2016, respectively, compared to the baselines.
Feedback RoI Features Improve Aerial Object Detection
Ren, Botao, Xu, Botian, Liu, Tengyu, Wang, Jingyi, Deng, Zhidong
Neuroscience studies have shown that the human visual system utilizes high-level feedback information to guide lower-level perception, enabling adaptation to signals of different characteristics. In light of this, we propose Feedback multi-Level feature Extractor (Flex) to incorporate a similar mechanism for object detection. Flex refines feature selection based on image-wise and instance-level feedback information in response to image quality variation and classification uncertainty. Experimental results show that Flex offers consistent improvement to a range of existing SOTA methods on the challenging aerial object detection datasets including DOTA-v1.0, DOTA-v1.5, and HRSC2016. Although the design originates in aerial image detection, further experiments on MS COCO also reveal our module's efficacy in general detection models. Quantitative and qualitative analyses indicate that the improvements are closely related to image qualities, which match our motivation.
SelfzCoT: a Self-Prompt Zero-shot CoT from Semantic-level to Code-level for a Better Utilization of LLMs
Lei, Ioktong, Deng, Zhidong
As a way of communicating with users and any LLMs like GPT or PaLM2, prompting becomes an increasingly important research topic for better utilization of LLMs. Although simple prompting has great performance on single-step questions, it cannot always activate the correct knowledge path for multi-step reasoning tasks. The chain of thought (CoT), which often contains Zero-shot CoT and few-shot CoT, is a recently developed prompting method that is capable of explaining the reasoning process to the LLM and outperforms simple prompting in three challenging reasoning tasks, including arithmetic, symbolic, and common-sense reasoning. This paper proposes a code-level self-prompt Zero-shot CoT (SelfzCoT) that takes advantage of an entity node or reasoning path of representing knowledge to activate deeper knowledge of larger path lengths within LLM in a graph way. It is done with three iterative steps in the format of step-by-step reasoning that can be easily adjusted or extended to different kinds of tasks.