AITopics | object detection network

Collaborating Authors

object detection network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary for Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

Neural Information Processing SystemsApr-25-2026, 01:34:21 GMT

In this supplementary material, we will provide more analyses of mask prior in Section 1 and similarity transfer in Section 2. We will show the visualization results in Section 3 and the performance variance with iteration in Section 4. We will also conduct experiments to mine base categories in the target dataset in Section 5. Besides, the hyper-parameters analyses will be provided in Section 6. Finally, we will discuss the limitations in Section 7. As mentioned in Section 3.2 in the main paper, mask prior provides coarse pixel-wise category information to improve the ability of the object detection network to locate and identify objects. Our ablation studies (Table 3 in the main paper) have already proved the advantage of mask prior. To further evaluate the effectiveness of mask prior, we evaluate object detection network with/without mask generator on VOC test set. Considering that the target dataset may contain both base categories and novel categories, in which only novel categories have ground-truth bounding boxes, we evaluate our method on novel categories.

category, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.87)
Information Technology > Artificial Intelligence > Machine Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.42)

Add feedback

Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

Neural Information Processing SystemsApr-25-2026, 01:34:17 GMT

Object detection has achieved promising success, but requires large-scale fullyannotated data, which is time-consuming and labor-extensive. Therefore, we consider object detection with mixed supervision, which learns novel object categories using weak annotations with the help of full annotations of existing base object categories. Previous works using mixed supervision mainly learn the classagnostic objectness from fully-annotated categories, which can be transferred to upgrade the weak annotations to pseudo full annotations for novel categories. In this paper, we further transfer mask prior and semantic similarity to bridge the gap between novel categories and base categories. Specifically, the ability of using mask prior to help detect objects is learned from base categories and transferred to novel categories. Moreover, the semantic similarity between objects learned from base categories is transferred to denoise the pseudo full annotations for novel categories. Experimental results on three benchmark datasets demonstrate the effectiveness of our method over existing methods.

category, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SupplementaryforMixedSupervisedObject DetectionbyTransferringMaskPriorandSemantic Similarity

Neural Information Processing SystemsFeb-7-2026, 19:25:55 GMT

Our ablation studies (Table3in the main paper) havealready proved the advantage of mask prior. From Figure 2, we can see that the coarse masks indicate the rough locations of objects which can help the object detection network predicttheboundingboxes. Tovalidate the transferability ofour similarity transfer,we evaluate our similarity network trained on COCO-60 trainval set. Wetreat the similarity prediction task as abinary classification task, in which the binary label 1 (resp., 0) means that two bounding boxes belong to the same category (resp.,different The precision, recall and F1 scores are summarized in Table 1. We observe that the gap between the performance of similarity network on base categories and novel categories is negligible (e.g., F1 Scores 84.9% v.s.

category, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.58)
Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

MonoUNI: A Unified Vehicle and Infrastructure-side Monocular 3D Object Detection Network with Sufficient Depth Clues

Neural Information Processing SystemsDec-24-2025, 06:31:50 GMT

Monocular 3D detection of vehicle and infrastructure sides are two important topics in autonomous driving. Due to diverse sensor installations and focal lengths, researchers are faced with the challenge of constructing algorithms for the two topics based on different prior knowledge. In this paper, by taking into account the diversity of pitch angles and focal lengths, we propose a unified optimization target named normalized depth, which realizes the unification of 3D detection problems for the two sides. Furthermore, to enhance the accuracy of monocular 3D detection, 3D normalized cube depth of obstacle is developed to promote the learning of depth information. We posit that the richness of depth clues is a pivotal factor impacting the detection performance on both the vehicle and infrastructure sides. A richer set of depth clues facilitates the model to learn better spatial knowledge, and the 3D normalized cube depth offers sufficient depth clues. Extensive experiments demonstrate the effectiveness of our approach. Without introducing any extra information, our method, named MonoUNI, achieves state-of-the-art performance on five widely used monocular 3D detection benchmarks, including Rope3D and DAIR-V2X-I for the infrastructure side, KITTI and Waymo for the vehicle side, and nuScenes for the cross-dataset evaluation.

monouni, object detection network, unified vehicle, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.98)

Add feedback

MonoUNI: A Unified Vehicle and Infrastructure-side Monocular 3D Object Detection Network with Sufficient Depth Clues

Neural Information Processing SystemsOct-10-2024, 13:47:50 GMT

infrastructure side, object detection network, sufficient depth clue, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback

A Fourier-enhanced multi-modal 3D small object optical mark recognition and positioning method for percutaneous abdominal puncture surgical navigation

Guo, Zezhao, Guo, Yanzhong, Zhao, Zhanfang

arXiv.org Artificial IntelligenceApr-13-2024

Navigation for thoracoabdominal puncture surgery is used to locate the needle entry point on the patient's body surface. The traditional reflective ball navigation method is difficult to position the needle entry point on the soft, irregular, smooth chest and abdomen. Due to the lack of clear characteristic points on the body surface using structured light technology, it is difficult to identify and locate arbitrary needle insertion points. Based on the high stability and high accuracy requirements of surgical navigation, this paper proposed a novel method, a muti-modal 3D small object medical marker detection method, which identifies the center of a small single ring as the needle insertion point. Moreover, this novel method leverages Fourier transform enhancement technology to augment the dataset, enrich image details, and enhance the network's capability. The method extracts the Region of Interest (ROI) of the feature image from both enhanced and original images, followed by generating a mask map. Subsequently, the point cloud of the ROI from the depth map is obtained through the registration of ROI point cloud contour fitting. In addition, this method employs Tukey loss for optimal precision. The experimental results show this novel method proposed in this paper not only achieves high-precision and high-stability positioning, but also enables the positioning of any needle insertion point.

depth map, detection, point cloud, (14 more...)

arXiv.org Artificial Intelligence

2404.0899

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.88)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Transformer based Multitask Learning for Image Captioning and Object Detection

Basak, Debolena, Srijith, P. K., Desarkar, Maunendra Sankar

arXiv.org Artificial IntelligenceMar-10-2024

In several real-world scenarios like autonomous navigation and mobility, to obtain a better visual understanding of the surroundings, image captioning and object detection play a crucial role. This work introduces a novel multitask learning framework that combines image captioning and object detection into a joint model. We propose TICOD, Transformer-based Image Captioning and Object Detection model for jointly training both tasks by combining the losses obtained from image captioning and object detection networks. By leveraging joint training, the model benefits from the complementary information shared between the two tasks, leading to improved performance for image captioning. Our approach utilizes a transformer-based architecture that enables end-to-end network integration for image captioning and object detection and performs both tasks jointly. We evaluate the effectiveness of our approach through comprehensive experiments on the MS-COCO dataset. Our model outperforms the baselines from image captioning literature by achieving a 3.65% improvement in BERTScore.

caption, detection, transformer, (14 more...)

arXiv.org Artificial Intelligence

2403.06292

Country:

Asia > India > Telangana > Hyderabad (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Warped Planar Object Detection Network For Automatic License Plate Recognition

Tra, Nguyen Dinh, Tri, Nguyen Cong, Hung, Phan Duy

arXiv.org Artificial IntelligenceDec-14-2022

This paper aims to improve the Warping Planer Object Detection Network (WPOD-Net) using feature engineering to increase accuracy. What problems are solved using the Warping Object Detection Network using feature engineering? More specifically, we think that it makes sense to add knowledge about edges in the image to enhance the information for determining the license plate contour of the original WPOD-Net model. The Sobel filter has been selected experimentally and acts as a Convolutional Neural Network layer, the edge information is combined with the old information of the original network to create the final embedding vector. The proposed model was compared with the original model on a set of data that we collected for evaluation. The results are evaluated through the Quadrilateral Intersection over Union value and demonstrate that the model has a significant improvement in performance.

artificial intelligence, deep learning, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2212.07066

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Building the world's largest outdoor AI artwork

#artificialintelligenceSep-28-2020, 09:10:06 GMT

The idea was straightforward: create an interactive artwork that highlights Baden-Württemberg and it's Cyber Valley Initiative as the epicenter for artificial intelligence on our continent. To do this properly, we had to think big and act fast, as the exhibition would already start in a mere five weeks. Too tempting to build something monumental to be set out in the public space. Put simply, our plan involved capturing the scene at the front, feeding it to a supercomputer and teleporting the results into a gigantic display. Thereby, the work shows an altered mirror image of reality, all created by a live-dreaming AI. We wanted the observer to dive in and become part of an art piece that is generated by machine intelligence and live-streamed to the internet.

artificial intelligence, artwork, largest outdoor ai artwork, (5 more...)

#artificialintelligence

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.06)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.06)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Evolution Of Object Detection Networks - YouTube

#artificialintelligenceSep-22-2019, 05:36:44 GMT

Intuition lectures on topics ranging from Classical CV techniques like HOG, SIFT to Convolutional Neural Network based techniques like Overfeat, Faster RCNN etc. You will learn how the ideas have evolved from some of the earliest papers to current ones. Intuition lectures on topics ranging from Classical CV techniques like HOG, SIFT to Convolutional Neural Network based techniques like Overfeat, Faster RCNN etc. You will learn how the ideas hav... more

convolutional neural network, evolution, object detection network, (7 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback