Goto

Collaborating Authors

 r-cnn





Clustering-based Feature Representation Learning for Oracle Bone Inscriptions Detection

Tao, Ye, Fu, Xinran, Pang, Honglin, Yang, Xi, Li, Chuntao

arXiv.org Artificial Intelligence

Oracle Bone Inscriptions (OBIs), play a crucial role in understanding ancient Chinese civilization. The automated detection of OBIs from rubbing images represents a fundamental yet challenging task in digital archaeology, primarily due to various degradation factors including noise and cracks that limit the effectiveness of conventional detection networks. To address these challenges, we propose a novel clustering-based feature space representation learning method. Our approach uniquely leverages the Oracle Bones Character (OBC) font library dataset as prior knowledge to enhance feature extraction in the detection network through clustering-based representation learning. The method incorporates a specialized loss function derived from clustering results to optimize feature representation, which is then integrated into the total network loss. We validate the effectiveness of our method by conducting experiments on two OBIs detection dataset using three mainstream detection frameworks: Faster R-CNN, DETR, and Sparse R-CNN. Through extensive experimentation, all frameworks demonstrate significant performance improvements.




Transcending Dimensions using Generative AI: Real-Time 3D Model Generation in Augmented Reality

Behravan, Majid, Haghani, Maryam, Gracanin, Denis

arXiv.org Artificial Intelligence

Traditional 3D modeling requires technical expertise, specialized software, and time-intensive processes, making it inaccessible for many users. Our research aims to lower these barriers by combining generative AI and augmented reality (AR) into a cohesive system that allows users to easily generate, manipulate, and interact with 3D models in real time, directly within AR environments. Utilizing cutting-edge AI models like Shap-E, we address the complex challenges of transforming 2D images into 3D representations in AR environments. Key challenges such as object isolation, handling intricate backgrounds, and achieving seamless user interaction are tackled through advanced object detection methods, such as Mask R-CNN. Evaluation results from 35 participants reveal an overall System Usability Scale (SUS) score of 69.64, with participants who engaged with AR/VR technologies more frequently rating the system significantly higher, at 80.71. This research is particularly relevant for applications in gaming, education, and AR-based e-commerce, offering intuitive, model creation for users without specialized skills.


An Empirical Study of Methods for Small Object Detection from Satellite Imagery

Yuan, Xiaohui, Chakravarty, Aniv, Gu, Lichuan, Wei, Zhenchun, Lichtenberg, Elinor, Chen, Tian

arXiv.org Artificial Intelligence

This paper reviews object detection methods for finding small objects from remote sensing imagery and provides an empirical evaluation of four state-of-the-art methods to gain insights into method performance and technical challenges. In particular, we use car detection from urban satellite images and bee box detection from satellite images of agricultural lands as application scenarios. Drawing from the existing surveys and literature, we identify several top-performing methods for the empirical study. Public, high-resolution satellite image datasets are used in our experiments.


Reviews: FreeAnchor: Learning to Match Anchors for Visual Object Detection

Neural Information Processing Systems

I am raising my score to seven. The authors begin by noting that many existing object detection pipelines include a step on'anchor assignment', where from a large set of candidate bounding boxes (or "anchors") in a generic image frame, the one that best matches the ground truth bounding box, as measure by IoU, is chosen to be the one that is used for training, ie the object detection and bounding box regression outputs for that anchor will be pushed towards the ground truth. The authors note that for objects which don't fill the anchor well (slim objects oriented diagonally, objects with holes, or occluded objects) the best anchor according to this IoU comparison may be actively bad for training as a whole. The authors propose "learning to match", ie producing a custom likelihood which promotes both precision and recall of the final result (making reference to terms from the traditional loss function). For each ground truth bounding box, a'bag of anchors' is selected by ranking IoU and picking the best n. During training, a different bounding box is selected from this bag for each object, for each backwards pass.


Reviews: Integrated perception with recurrent multi-task neural networks

Neural Information Processing Systems

This paper is crystal clear and the main points are easily accessible. The key idea of integrated learning of representation sharing and output correlation is sound and well executed in the new architecture comprising CNNs, R-CNNs, RNNs and autoencoders. My main concern is regarding the experimental evaluation. There is clear room for improvement: (1) the authors are encouraged to use the standard VOC 2012 dataset instead of the more obsolete VOC 2010/2007 datasets--this makes direct comparison of different methods possible; (2) the baseline methods (Independent and Multi-task in Table 1) are too simple to justify the effectiveness of the proposed method, and more recent work on multi-task deep learning should be compared. Note that, although this paper contrasts itself clearly from the literature, it does not mean that it is enough to evaluate the proposed method only against simple baselines.