AITopics | Object-Oriented Architecture

Collaborating Authors

Object-Oriented Architecture

News Overviews Instructional Materials AI-Alerts Classics

Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation

Tang, Feilong, Xu, Zhongxing, Qu, Zhaojun, Feng, Wei, Jiang, Xingjian, Ge, Zongyuan

arXiv.org Artificial IntelligenceMar-12-2024

Recent weakly supervised semantic segmentation (WSSS) methods strive to incorporate contextual knowledge to improve the completeness of class activation maps (CAM). In this work, we argue that the knowledge bias between instances and contexts affects the capability of the prototype to sufficiently understand instance semantics. Inspired by prototype learning theory, we propose leveraging prototype awareness to capture diverse and fine-grained feature attributes of instances. The hypothesis is that contextual prototypes might erroneously activate similar and frequently co-occurring object categories due to this knowledge bias. Therefore, we propose to enhance the prototype representation ability by mitigating the bias to better capture spatial coverage in semantic object regions. With this goal, we present a Context Prototype-Aware Learning (CPAL) strategy, which leverages semantic context to enrich instance comprehension. The core of this method is to accurately capture intra-class variations in object features through context-aware prototypes, facilitating the adaptation to the semantic attributes of various instances. We design feature distribution alignment to optimize prototype awareness, aligning instance feature distributions with dense features. In addition, a unified training framework is proposed to combine label-guided classification supervision and prototypes-guided self-supervision. Experimental results on PASCAL VOC 2012 and MS COCO 2014 show that CPAL significantly improves off-the-shelf methods and achieves state-of-the-art performance. The project is available at https://github.com/Barrett-python/CPAL.

prototype, segmentation, semantic segmentation, (14 more...)

arXiv.org Artificial Intelligence

2403.0763

Country:

North America > United States > Michigan (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)

Add feedback

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval

Sarkar, Rohan, Kak, Avinash

arXiv.org Artificial IntelligenceFeb-29-2024

In the context of pose-invariant object recognition and retrieval, we demonstrate that it is possible to achieve significant improvements in performance if both the category-based and the object-identity-based embeddings are learned simultaneously during training. In hindsight, that sounds intuitive because learning about the categories is more fundamental than learning about the individual objects that correspond to those categories. However, to the best of what we know, no prior work in pose-invariant learning has demonstrated this effect. This paper presents an attention-based dual-encoder architecture with specially designed loss functions that optimize the inter- and intra-class distances simultaneously in two different embedding spaces, one for the category embeddings and the other for the object-level embeddings. The loss functions we have proposed are pose-invariant ranking losses that are designed to minimize the intra-class distances and maximize the inter-class distances in the dual representation spaces. We demonstrate the power of our approach with three challenging multi-view datasets, ModelNet-40, ObjectPI, and FG3D. With our dual approach, for single-view object recognition, we outperform the previous best by 20.0% on ModelNet40, 2.0% on ObjectPI, and 46.5% on FG3D. On the other hand, for single-view object retrieval, we outperform the previous best by 33.7% on ModelNet40, 18.8% on ObjectPI, and 56.9% on FG3D.

category, machine learning, object-oriented architecture, (17 more...)

arXiv.org Artificial Intelligence

2403.00272

Country: North America > United States (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.55)

Add feedback

VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks

Wang, Yutong, Jiang, Chaoyang, Chen, Xieyuanli

arXiv.org Artificial IntelligenceFeb-26-2024

In recent years, object-oriented simultaneous localization and mapping (SLAM) has attracted increasing attention due to its ability to provide high-level semantic information while maintaining computational efficiency. Some researchers have attempted to enhance localization accuracy by integrating the modeled object residuals into bundle adjustment. However, few have demonstrated better results than feature-based visual SLAM systems, as the generic coarse object models, such as cuboids or ellipsoids, are less accurate than feature points. In this paper, we propose a Visual Object Odometry and Mapping framework VOOM using high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner instead of directly using object residuals in bundle adjustment. Firstly, we introduce an improved observation model and a novel data association method for dual quadrics, employed to represent physical objects. It facilitates the creation of a 3D map that closely reflects reality. Next, we use object information to enhance the data association of feature points and consequently update the map. In the visual object odometry backend, the updated map is employed to further optimize the camera pose and the objects. Meanwhile, local bundle adjustment is performed utilizing the objects and points-based covisibility graphs in our visual object mapping process. Experiments show that VOOM outperforms both object-oriented SLAM and feature points SLAM systems such as ORB-SLAM2 in terms of localization. The implementation of our method is available at https://github.com/yutongwangBIT/VOOM.git.

map point, mapping, orb-slam2, (15 more...)

arXiv.org Artificial Intelligence

2402.13609

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.56)

Add feedback

Semantic Object-level Modeling for Robust Visual Camera Relocalization

Zhu, Yifan, Miao, Lingjuan, Wu, Haitao, Zhou, Zhiqiang, Chen, Weiyi, Wu, Longwen

arXiv.org Artificial IntelligenceFeb-10-2024

Visual relocalization is crucial for autonomous visual localization and navigation of mobile robotics. Due to the improvement of CNN-based object detection algorithm, the robustness of visual relocalization is greatly enhanced especially in viewpoints where classical methods fail. However, ellipsoids (quadrics) generated by axis-aligned object detection may limit the accuracy of the object-level representation and degenerate the performance of visual relocalization system. In this paper, we propose a novel method of automatic object-level voxel modeling for accurate ellipsoidal representations of objects. As for visual relocalization, we design a better pose optimization strategy for camera pose recovery, to fully utilize the projection characteristics of 2D fitted ellipses and the 3D accurate ellipsoids. All of these modules are entirely intergrated into visual SLAM system. Experimental results show that our semantic object-level mapping and object-based visual relocalization methods significantly enhance the performance of visual relocalization in terms of robustness to new viewpoints.

ellipsoid, relocalization, representation, (16 more...)

arXiv.org Artificial Intelligence

2402.06951

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Beijing > Beijing (0.06)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.72)

Add feedback

CLIP-Loc: Multi-modal Landmark Association for Global Localization in Object-based Maps

Matsuzaki, Shigemichi, Sugino, Takuma, Tanaka, Kazuhito, Sha, Zijun, Nakaoka, Shintaro, Yoshizawa, Shintaro, Shintani, Kazuhiro

arXiv.org Artificial IntelligenceFeb-8-2024

This paper describes a multi-modal data association method for global localization using object-based maps and camera images. In global localization, or relocalization, using object-based maps, existing methods typically resort to matching all possible combinations of detected objects and landmarks with the same object category, followed by inlier extraction using RANSAC or brute-force search. This approach becomes infeasible as the number of landmarks increases due to the exponential growth of correspondence candidates. In this paper, we propose labeling landmarks with natural language descriptions and extracting correspondences based on conceptual similarity with image observations using a Vision Language Model (VLM). By leveraging detailed text information, our approach efficiently extracts correspondences compared to methods using only object categories. Through experiments, we demonstrate that the proposed method enables more accurate global localization with fewer iterations compared to baseline methods, exhibiting its efficiency.

correspondence, landmark, localization, (16 more...)

arXiv.org Artificial Intelligence

2402.06092

Country: Asia > Japan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Enhancing Embodied Object Detection through Language-Image Pre-training and Implicit Object Memory

Chapman, Nicolas Harvey, Dayoub, Feras, Browne, Will, Lehnert, Chris

arXiv.org Artificial IntelligenceFeb-6-2024

Deep-learning and large scale language-image training have produced image object detectors that generalise well to diverse environments and semantic classes. However, single-image object detectors trained on internet data are not optimally tailored for the embodied conditions inherent in robotics. Instead, robots must detect objects from complex multi-modal data streams involving depth, localisation and temporal correlation, a task termed embodied object detection. Paradigms such as Video Object Detection (VOD) and Semantic Mapping have been proposed to leverage such embodied data streams, but existing work fails to enhance performance using language-image training. In response, we investigate how an image object detector pre-trained using language-image data can be extended to perform embodied object detection. We propose a novel implicit object memory that uses projective geometry to aggregate the features of detected objects across long temporal horizons. The spatial and temporal information accumulated in memory is then used to enhance the image features of the base detector. When tested on embodied data streams sampled from diverse indoor scenes, our approach improves the base object detector by 3.09 mAP, outperforming alternative external memories designed for VOD and Semantic Mapping. Our method also shows a significant improvement of 16.90 mAP relative to baselines that perform embodied object detection without first training on language-image data, and is robust to sensor noise and domain shift experienced in real-world deployment.

detection, detector, external memory, (16 more...)

arXiv.org Artificial Intelligence

2402.03721

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

Universal Imitation Games

Mahadevan, Sridhar

arXiv.org Artificial IntelligenceFeb-1-2024

Alan Turing proposed in 1950 a framework called an imitation game to decide if a machine could think. Using mathematics developed largely after Turing -- category theory -- we analyze a broader class of universal imitation games (UIGs), which includes static, dynamic, and evolutionary games. In static games, the participants are in a steady state. In dynamic UIGs, "learner" participants are trying to imitate "teacher" participants over the long run. In evolutionary UIGs, the participants are competing against each other in an evolutionary game, and participants can go extinct and be replaced by others with higher fitness. We use the framework of category theory -- in particular, two influential results by Yoneda -- to characterize each type of imitation game. Universal properties in categories are defined by initial and final objects. We characterize dynamic UIGs where participants are learning by inductive inference as initial algebras over well-founded sets, and contrast them with participants learning by conductive inference over the final coalgebra of non-well-founded sets. We briefly discuss the extension of our categorical framework for UIGs to imitation games on quantum computers.

arbitrary category, dinatural transformation, universal coalgebra jacob, (15 more...)

arXiv.org Artificial Intelligence

2405.0154

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.13)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(19 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.92)
Information Technology (0.87)
(2 more...)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
(13 more...)

Add feedback

SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation

Lin, Ci-Siang, Wang, Chien-Yi, Wang, Yu-Chiang Frank, Chen, Min-Hung

arXiv.org Artificial IntelligenceJan-22-2024

Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models using training image data with only image-level supervision. Since precise pixel-level annotations are not accessible, existing methods typically focus on producing pseudo masks for training segmentation models by refining CAM-like heatmaps. However, the produced heatmaps may only capture discriminative image regions of target object categories or the associated co-occurring backgrounds. To address the issues, we propose a Semantic Prompt Learning for WSSS (SemPLeS) framework, which learns to effectively prompt the CLIP space to enhance the semantic alignment between the segmented regions and the target object categories. More specifically, we propose Contrastive Prompt Learning and Class-associated Semantic Refinement to learn the prompts that adequately describe and suppress the image backgrounds associated with each target object category. In this way, our proposed framework is able to perform better semantic matching between object regions and the associated text labels, resulting in desired pseudo masks for training the segmentation model. The proposed SemPLeS framework achieves SOTA performance on the standard WSSS benchmarks, PASCAL VOC and MS COCO, and demonstrated interpretability with the semantic visualization of our learned prompts. The codes will be released.

prompt learning, segmentation, semantic segmentation, (14 more...)

arXiv.org Artificial Intelligence

2401.11791

Country: Asia > Taiwan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Panoptic Vision-Language Feature Fields

Chen, Haoran, Blomqvist, Kenneth, Milano, Francesco, Siegwart, Roland

arXiv.org Artificial IntelligenceJan-18-2024

Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture. Our code will be available at https://github.com/ethz-asl/pvlff.

feature field, segmentation, semantic segmentation, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2024.3354624

2309.05448

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.34)

Add feedback

Object-Oriented Semantic Mapping for Reliable UAVs Navigation

Canh, Thanh Nguyen, Elibol, Armagan, Chong, Nak Young, HoangVan, Xiem

arXiv.org Artificial IntelligenceJan-16-2024

To autonomously navigate in real-world environments, special in search and rescue operations, Unmanned Aerial Vehicles (UAVs) necessitate comprehensive maps to ensure safety. However, the prevalent metric map often lacks semantic information crucial for holistic scene comprehension. In this paper, we proposed a system to construct a probabilistic metric map enriched with object information extracted from the environment from RGB-D images. Our approach combines a state-of-the-art YOLOv8-based object detection framework at the front end and a 2D SLAM method - CartoGrapher at the back end. To effectively track and position semantic object classes extracted from the front-end interface, we employ the innovative BoT-SORT methodology. A novel association method is introduced to extract the position of objects and then project it with the metric map. Unlike previous research, our approach takes into reliable navigating in the environment with various hollow bottom objects. The output of our system is a probabilistic map, which significantly enhances the map's representation by incorporating object-specific attributes, encompassing class distinctions, accurate positioning, and object heights. A number of experiments have been conducted to evaluate our proposed approach. The results show that the robot can effectively produce augmented semantic maps containing several objects (notably chairs and desks). Furthermore, our system is evaluated within an embedded computer - Jetson Xavier AGX unit to demonstrate the use case in real-world applications.

detection, information, point cloud, (13 more...)

arXiv.org Artificial Intelligence

2401.08132

Country:

Asia > Japan (0.05)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Aerospace & Defense (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.70)

Add feedback