AITopics | Object-Oriented Architecture

Collaborating Authors

Object-Oriented Architecture

News Overviews Instructional Materials AI-Alerts Classics

Ten Years of Teaching Empirical Software Engineering in the context of Energy-efficient Software

Malavolta, Ivano, Stoico, Vincenzo, Lago, Patricia

arXiv.org Artificial IntelligenceJul-8-2024

In this chapter we share our experience in running ten editions of the Green Lab course at the Vrije Universiteit Amsterdam, the Netherlands. The course is given in the Software Engineering and Green IT track of the Computer Science Master program of the VU. The course takes place every year over a 2-month period and teaches Computer Science students the fundamentals of Empirical Software Engineering in the context of energy-efficient software. The peculiarity of the course is its research orientation: at the beginning of the course the instructor presents a catalog of scientifically relevant goals, and each team of students signs up for one of them and works together for 2 months on their own experiment for achieving the goal. Each team goes over the classic steps of an empirical study, starting from a precise formulation of the goal and research questions to context definition, selection of experimental subjects and objects, definition of experimental variables, experiment execution, data analysis, and reporting. Over the years, the course became well-known within the Software Engineering community since it led to several scientific studies that have been published at various scientific conferences and journals. Also, students execute their experiments using \textit{open-source tools}, which are developed and maintained by researchers and other students within the program, thus creating a virtuous community of learners where students exchange ideas, help each other, and learn how to collaboratively contribute to open-source projects in a safe environment.

machine learning, object-oriented architecture, programming language, (17 more...)

arXiv.org Artificial Intelligence

2407.05689

Country:

Europe > Netherlands > North Holland > Amsterdam (0.25)
Europe > United Kingdom (0.04)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > Experimental Study > Negative Result (0.48)

Industry:

Information Technology (1.00)
Energy (1.00)
Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Software Engineering (1.00)
Information Technology > Data Science (1.00)
(7 more...)

Add feedback

Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots

Ravipati, Siva Krishna, Latif, Ehsan, Parasuraman, Ramviyas, Bhandarkar, Suchendra M.

arXiv.org Artificial IntelligenceJul-8-2024

Classification of different object surface material types can play a significant role in the decision-making algorithms for mobile robots and autonomous vehicles. RGB-based scene-level semantic segmentation has been well-addressed in the literature. However, improving material recognition using the depth modality and its integration with SLAM algorithms for 3D semantic mapping could unlock new potential benefits in the robotics perception pipeline. To this end, we propose a complementarity-aware deep learning approach for RGB-D-based material classification built on top of an object-oriented pipeline. The approach further integrates the ORB-SLAM2 method for 3D scene mapping with multiscale clustering of the detected material semantics in the point cloud map generated by the visual SLAM algorithm. Extensive experimental results with existing public datasets and newly contributed real-world robot datasets demonstrate a significant improvement in material classification and 3D clustering accuracy compared to state-of-the-art approaches for 3D semantic scene mapping.

mapping, material classification, point cloud, (14 more...)

arXiv.org Artificial Intelligence

2407.06077

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Multi-Object Hallucination in Vision-Language Models

Chen, Xuweiyi, Ma, Ziqiao, Zhang, Xuejun, Xu, Sihan, Qian, Shengyi, Yang, Jianing, Fouhey, David F., Chai, Joyce

arXiv.org Artificial IntelligenceJul-8-2024

Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent objects or become distracted) when tasked with focusing on multiple objects simultaneously. We introduce Recognition-based Object Probing Evaluation (ROPE), an automated evaluation protocol that considers the distribution of object classes within a single image during testing and uses visual referring prompts to eliminate ambiguity. With comprehensive empirical studies and analysis of potential factors leading to multi-object hallucination, we found that (1) LVLMs suffer more hallucinations when focusing on multiple objects compared to a single object. (2) The tested object class distribution affects hallucination behaviors, indicating that LVLMs may follow shortcuts and spurious correlations.(3) Hallucinatory behaviors are influenced by data-specific factors, salience and frequency, and model intrinsic behaviors. We hope to enable LVLMs to recognize and reason about multiple objects that often occur in realistic visual scenes, provide insights, and quantify our progress towards mitigating the issues.

hallucination, lvlm, preprint arxiv, (16 more...)

arXiv.org Artificial Intelligence

2407.06192

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

ClutterGen: A Cluttered Scene Generator for Robot Learning

Jia, Yinsen, Chen, Boyuan

arXiv.org Artificial IntelligenceJul-7-2024

Simulation has played an important role in advancing robot learning [1, 2, 3, 4] by providing a controlled yet versatile environment for developing and testing algorithms. Data-driven approaches, in particular, typically deploy robots into simulations to undergo extensive training across a variety of diverse and randomized settings to enable generalizable and adaptable behaviors. Significant advancements in robot learning have been achieved by randomizing object shapes [4, 5], textures [6, 7, 8, 9], and dynamics [10]. However, the layout of objects, despite being another critical factor, remains challenging to reach fully open-ended randomization. Unlike object properties, which can be easily specified within a range without interfering with other objects, object layout must consider the presence of other objects and physical feasibility. For instance, arranging objects in a scene requires ensuring that they do not overlap and are placed in stable positions instead of falling down from the air. Existing efforts often prevent this issue by fixing the object bases [11, 4, 12, 13], but this strategy is not suitable for many objects like bottles or cups. As the number of objects increases within a limited space, generating a randomized yet stable object layout becomes exponentially difficult.

arxiv preprint arxiv, cluttergen, placement, (13 more...)

arXiv.org Artificial Intelligence

2407.05425

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.54)

Add feedback

Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection

Yang, Zhiqiang, Guan, Qiu, Zhao, Keer, Yang, Jianmin, Xu, Xinli, Long, Haixia, Tang, Ying

arXiv.org Artificial IntelligenceJul-5-2024

Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi-Branch Auxiliary FPN (MAFPN). Within MAFPN, the Superficial Assisted Fusion (SAF) module is designed to combine the output of the backbone with the neck, preserving an optimal level of shallow information to facilitate subsequent learning. Meanwhile, the Advanced Assisted Fusion (AAF) module deeply embedded within the neck conveys a more diverse range of gradient information to the output layer. Furthermore, our proposed Re-parameterized Heterogeneous Efficient Layer Aggregation Network (RepHELAN) module ensures that both the overall model architecture and convolutional design embrace the utilization of heterogeneous large convolution kernels. Therefore, this guarantees the preservation of information related to small targets while simultaneously achieving the multi-scale receptive field. Finally, taking the nano version of MAF-YOLO for example, it can achieve 42.4% AP on COCO with only 3.76M learnable parameters and 10.51G FLOPs, and approximately outperforms YOLOv8n by about 5.1%. The source code of this work is available at: https://github.com/yang-0201/MAF-YOLO.

architecture, arxiv preprint arxiv, information, (11 more...)

arXiv.org Artificial Intelligence

2407.04381

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.34)

Add feedback

SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

Liu, Xu, Lei, Jiuzhou, Prabhu, Ankit, Tao, Yuezhan, Spasojevic, Igor, Chaudhari, Pratik, Atanasov, Nikolay, Kumar, Vijay

arXiv.org Artificial IntelligenceJul-2-2024

This paper develops a real-time decentralized metric-semantic Simultaneous Localization and Mapping (SLAM) approach that leverages a sparse and lightweight object-based representation to enable a heterogeneous robot team to autonomously explore 3D environments featuring indoor, urban, and forested areas without relying on GPS. We use a hierarchical metric-semantic representation of the environment, including high-level sparse semantic maps of object models and low-level voxel maps. We leverage the informativeness and viewpoint invariance of the high-level semantic map to obtain an effective semantics-driven place-recognition algorithm for inter-robot loop closure detection across aerial and ground robots with different sensing modalities. A communication module is designed to track each robot's own observations and those of other robots whenever communication links are available. Such observations are then used to construct a merged map. Our framework enables real-time decentralized operations onboard robots, allowing them to opportunistically leverage communication. We integrate and deploy our proposed framework on three types of aerial and ground robots. Extensive experimental results show an average inter-robot localization error of approximately 20 cm in position and 0.2 degrees in orientation, an object mapping F1 score consistently over 0.9, and a communication packet size of merely 2-3 megabytes per kilometer trajectory with as many as 1,000 landmarks. The project website can be found at https://xurobotics.github.io/slideslam/.

information, loop closure, robot, (15 more...)

arXiv.org Artificial Intelligence

2406.17249

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Telecommunications (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

Wang, Yuxuan, Liu, Yijun, Yu, Fei, Huang, Chen, Li, Kexin, Wan, Zhiguo, Che, Wanxiang

arXiv.org Artificial IntelligenceJul-1-2024

Despite the rapid development of Chinese vision-language models (VLMs), most existing Chinese vision-language (VL) datasets are constructed on Western-centric images from existing English VL datasets. The cultural bias in the images makes these datasets unsuitable for evaluating VLMs in Chinese culture. To remedy this issue, we present a new Chinese Vision- Language Understanding Evaluation (CVLUE) benchmark dataset, where the selection of object categories and images is entirely driven by Chinese native speakers, ensuring that the source images are representative of Chinese culture. The benchmark contains four distinct VL tasks ranging from image-text retrieval to visual question answering, visual grounding and visual dialogue. We present a detailed statistical analysis of CVLUE and provide a baseline performance analysis with several open-source multilingual VLMs on CVLUE and its English counterparts to reveal their performance gap between English and Chinese. Our in-depth category-level analysis reveals a lack of Chinese cultural knowledge in existing VLMs. We also find that fine-tuning on Chinese culture-related VL datasets effectively enhances VLMs' understanding of Chinese culture.

category, chinese culture, dataset, (16 more...)

arXiv.org Artificial Intelligence

2407.01081

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York (0.04)
(20 more...)

Genre: Research Report (0.64)

Industry:

Transportation (0.46)
Leisure & Entertainment > Sports (0.46)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.34)

Add feedback

Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

Wu, Junfei, Liu, Qiang, Wang, Ding, Zhang, Jinghao, Wu, Shu, Wang, Liang, Tan, Tieniu

arXiv.org Artificial IntelligenceJun-28-2024

Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality.

hallucination, logiccheckgpt, lvlm, (15 more...)

arXiv.org Artificial Intelligence

2402.11622

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment > Sports > Tennis (0.96)
Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)

Add feedback

Towards Open-set Camera 3D Object Detection

He, Zhuolin, Li, Xinrun, Gao, Heng, Tang, Jiachen, Qiu, Shoumeng, Wang, Wenfu, Lu, Lvjian, Qiu, Xuchong, Xue, Xiangyang, Pu, Jian

arXiv.org Artificial IntelligenceJun-26-2024

Traditional camera 3D object detectors are typically trained to recognize a predefined set of known object classes. In real-world scenarios, these detectors may encounter unknown objects outside the training categories and fail to identify them correctly. To address this gap, we present OS-Det3D (Open-set Camera 3D Object Detection), a two-stage training framework enhancing the ability of camera 3D detectors to identify both known and unknown objects. The framework involves our proposed 3D Object Discovery Network (ODN3D), which is specifically trained using geometric cues such as the location and scale of 3D boxes to discover general 3D objects. ODN3D is trained in a class-agnostic manner, and the provided 3D object region proposals inherently come with data noise. To boost accuracy in identifying unknown objects, we introduce a Joint Objectness Selection (JOS) module. JOS selects the pseudo ground truth for unknown objects from the 3D object region proposals of ODN3D by combining the ODN3D objectness and camera feature attention objectness. Experiments on the nuScenes and KITTI datasets demonstrate the effectiveness of our framework in enabling camera 3D detectors to successfully identify unknown objects while also improving their performance on known objects.

detection, machine learning, object-oriented architecture, (17 more...)

arXiv.org Artificial Intelligence

2406.17297

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Industry: Transportation > Ground (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.48)

Add feedback

Unseen Object Reasoning with Shared Appearance Cues

Singh, Paridhi, Kumar, Arun

arXiv.org Artificial IntelligenceJun-21-2024

This paper introduces an innovative approach to open world recognition (OWR), where we leverage knowledge acquired from known objects to address the recognition of previously unseen objects. The traditional method of object modeling relies on supervised learning with strict closed-set assumptions, presupposing that objects encountered during inference are already known at the training phase. However, this assumption proves inadequate for real-world scenarios due to the impracticality of accounting for the immense diversity of objects. Our hypothesis posits that object appearances can be represented as collections of "shareable" mid-level features, arranged in constellations to form object instances. By adopting this framework, we can efficiently dissect and represent both known and unknown objects in terms of their appearance cues. Our paper introduces a straightforward yet elegant method for modeling novel or unseen objects, utilizing established appearance cues and accounting for inherent uncertainties. This representation not only enables the detection of out-of-distribution objects or novel categories among unseen objects but also facilitates a deeper level of reasoning, empowering the identification of the superclass to which an unknown instance belongs. This novel approach holds promise for advancing open world recognition in diverse applications.

class and appearance cluster, dataset, t-sne feature plot, (15 more...)

arXiv.org Artificial Intelligence

2406.15565

Genre:

Research Report > Promising Solution (0.54)
Overview > Innovation (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback