AITopics | Object-Oriented Architecture

Collaborating Authors

Object-Oriented Architecture

News Overviews Instructional Materials AI-Alerts Classics

Learn Python for software engineering for just 25

PCWorldJan-15-2024, 08:00:00 GMT

Python is the world's most popular programming language because it's relatively easy to learn and its general-purpose nature makes it practical in many places. For instance, if you're interested in software engineering, learning Python can be a great asset. Get it done this year with The 2024 Python for Software Engineering Bootcamp Certification Bundle. This bundle includes seven courses and 165 hours of training from Packt Publishing, The beginner-friendly courses give you an introduction to Python, its syntax, and the concept of object-oriented programming. You'll hone your skills with a variety of real-life projects, getting practical software engineering skills that you'll be able to apply in the real world.

object-oriented architecture, programming language, python, (5 more...)

PCWorld

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software > Programming Languages (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.65)

Add feedback

Learn core programming principles with this 25 Java course

PCWorldJan-12-2024, 10:00:00 GMT

One of the world's most popular programming languages, Java has been a core tenet of web development for decades. If you want to get into coding this year, The 2024 Java Programming Certification Bundle is a great place to start, especially now that it's just 24.99 for a limited time. This extensive bundle includes seven courses and more than 85 hours of coursework from Packet Publishing (five-star rated). The beginner-friendly bundle will introduce you to object-oriented programming, Java programming, Java design patterns, and more. You'll also learn how to build modern distributed systems, tap into other Java libraries, and much more.

artificial intelligence, object-oriented architecture, programming language, (4 more...)

PCWorld

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.66)

Add feedback

HomeRobot: Open-Vocabulary Mobile Manipulation

Yenamandra, Sriram, Ramachandran, Arun, Yadav, Karmesh, Wang, Austin, Khanna, Mukul, Gervet, Theophile, Yang, Tsung-Yen, Jain, Vidhi, Clegg, Alexander William, Turner, John, Kira, Zsolt, Savva, Manolis, Chang, Angel, Chaplot, Devendra Singh, Batra, Dhruv, Mottaghi, Roozbeh, Bisk, Yonatan, Paxton, Chris

arXiv.org Artificial IntelligenceJan-10-2024

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it involves tackling sub-problems from across robotics: perception, language understanding, navigation, and manipulation are all essential to OVMM. In addition, integration of the solutions to these sub-problems poses its own substantial challenges. To drive research in this area, we introduce the HomeRobot OVMM benchmark, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch to encourage replication of real-world experiments across labs. We implement both reinforcement learning and heuristic (model-based) baselines and show evidence of sim-to-real transfer. Our baselines achieve a 20% success rate in the real world; our experiments identify ways future research work improve performance. See videos on our website: https://ovmm.github.io/.

agent, receptacle, robot, (16 more...)

arXiv.org Artificial Intelligence

2306.11565

Country:

North America > United States (0.46)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

SOS-SLAM: Segmentation for Open-Set SLAM in Unstructured Environments

Kinnari, Jouko, Thomas, Annika, Lusk, Parker, Kondo, Kota, How, Jonathan P.

arXiv.org Artificial IntelligenceJan-9-2024

We present a novel framework for open-set Simultaneous Localization and Mapping (SLAM) in unstructured environments that uses segmentation to create a map of objects and geometric relationships between objects for localization. Our system consists of 1) a front-end mapping pipeline using a zero-shot segmentation model to extract object masks from images and track them across frames to generate an object-based map and 2) a frame alignment pipeline that uses the geometric consistency of objects to efficiently localize within maps taken in a variety of conditions. This approach is shown to be more robust to changes in lighting and appearance than traditional feature-based SLAM systems or global descriptor methods. This is established by evaluating SOS-SLAM on the Batvik seasonal dataset which includes drone flights collected over a coastal plot of southern Finland during different seasons and lighting conditions. Across flights during varying environmental conditions, our approach achieves higher recall than benchmark methods with precision of 1.0. SOS-SLAM localizes within a reference map up to 14x faster than other feature based approaches and has a map size less than 0.4% the size of the most compact other maps. When considering localization performance from varying viewpoints, our approach outperforms all benchmarks from the same viewpoint and most benchmarks from different viewpoints. SOS-SLAM is a promising new approach for SLAM in unstructured environments that is robust to changes in lighting and appearance and is more computationally efficient than other approaches. We release our code and datasets: https://acl.mit.edu/SOS-SLAM/.

correspondence, flight, unstructured environment, (16 more...)

arXiv.org Artificial Intelligence

2401.04791

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.24)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Europe > Austria > Styria > Graz (0.04)

Genre: Research Report (0.64)

Industry: Transportation (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.48)
(2 more...)

Add feedback

Object-oriented backdoor attack against image captioning

Li, Meiling, Zhong, Nan, Zhang, Xinpeng, Qian, Zhenxing, Li, Sheng

arXiv.org Artificial IntelligenceJan-4-2024

Backdoor attack against image classification task has been widely studied and proven to be successful, while there exist little research on the backdoor attack against vision-language models. In this paper, we explore backdoor attack towards image captioning models by poisoning training data. Assuming the attacker has total access to the training dataset, and cannot intervene in model construction or training process. Specifically, a portion of benign training samples is randomly selected to be poisoned. Afterwards, considering that the captions are usually unfolded around objects in an image, we design an object-oriented method to craft poisons, which aims to modify pixel values by a slight range with the modification number proportional to the scale of the current detected object region. After training with the poisoned data, the attacked model behaves normally on benign images, but for poisoned images, the model will generate some sentences irrelevant to the given image. The attack controls the model behavior on specific test images without sacrificing the generation performance on benign test images. Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.

backdoor attack, caption, dataset, (15 more...)

arXiv.org Artificial Intelligence

2401.026

Country: Asia > China > Shanghai > Shanghai (0.05)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.63)

Add feedback

ODIN: A Single Model for 2D and 3D Perception

Jain, Ayush, Katara, Pushkal, Gkanatsios, Nikolaos, Harley, Adam W., Sarch, Gabriel, Aggarwal, Kriti, Chaudhary, Vishrav, Fragkiadaki, Katerina

arXiv.org Artificial IntelligenceJan-4-2024

State-of-the-art models on contemporary 3D perception benchmarks like ScanNet consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images. They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead. The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures. In this paper, we challenge this view and propose ODIN (Omni-Dimensional INstance segmentation), a model that can segment and label both 2D RGB images and 3D point clouds, using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion. Our model differentiates 2D and 3D feature operations through the positional encodings of the tokens involved, which capture pixel coordinates for 2D patch tokens and 3D coordinates for 3D feature tokens. ODIN achieves state-of-the-art performance on ScanNet200, Matterport3D and AI2THOR 3D instance segmentation benchmarks, and competitive performance on ScanNet, S3DIS and COCO. It outperforms all previous works by a wide margin when the sensed 3D point cloud is used in place of the point cloud sampled from 3D mesh. When used as the 3D perception engine in an instructable embodied agent architecture, it sets a new state-of-the-art on the TEACh action-from-dialogue benchmark. Our code and checkpoints can be found at the project website: https://odin-seg.github.io.

point cloud, proceedings, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2401.02416

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.54)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.48)

Add feedback

Merging Vision Transformers from Different Tasks and Domains

Ye, Peng, Huang, Chenyu, Shen, Mingzhu, Chen, Tao, Huang, Yongqi, Zhang, Yuning, Ouyang, Wanli

arXiv.org Artificial IntelligenceDec-25-2023

This work targets to merge various Vision Transformers (ViTs) trained on different tasks (i.e., datasets with different object categories) or domains (i.e., datasets with the same categories but different environments) into one unified model, yielding still good performance on each task or domain. Previous model merging works focus on either CNNs or NLP models, leaving the ViTs merging research untouched. To fill this gap, we first explore and find that existing model merging methods cannot well handle the merging of the whole ViT models and still have improvement space. To enable the merging of the whole ViT, we propose a simple-but-effective gating network that can both merge all kinds of layers (e.g., Embedding, Norm, Attention, and MLP) and select the suitable classifier. Specifically, the gating network is trained by unlabeled datasets from all the tasks (domains), and predicts the probability of which task (domain) the input belongs to for merging the models during inference. To further boost the performance of the merged model, especially when the difficulty of merging tasks increases, we design a novel metric of model weight similarity, and utilize it to realize controllable and combined weight merging. Comprehensive experiments on kinds of newly established benchmarks, validate the superiority of the proposed ViT merging framework for different tasks and domains. Our method can even merge beyond 10 ViT models from different vision tasks with a negligible effect on the performance of each task.

dataset, different task, similarity, (14 more...)

arXiv.org Artificial Intelligence

2312.1624

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.34)

Add feedback

Object Attribute Matters in Visual Question Answering

Li, Peize, Si, Qingyi, Fu, Peng, Lin, Zheng, Wang, Yan

arXiv.org Artificial IntelligenceDec-20-2023

Visual question answering is a multimodal task that requires the joint comprehension of visual and textual information. However, integrating visual and textual semantics solely through attention layers is insufficient to comprehensively understand and align information from both modalities. Intuitively, object attributes can naturally serve as a bridge to unify them, which has been overlooked in previous research. In this paper, we propose a novel VQA approach from the perspective of utilizing object attribute, aiming to achieve better object-level visual-language alignment and multimodal scene understanding. Specifically, we design an attribute fusion module and a contrastive knowledge distillation module. The attribute fusion module constructs a multimodal graph neural network to fuse attributes and visual features through message passing. The enhanced object-level visual features contribute to solving fine-grained problem like counting-question. The better object-level visual-language alignment aids in understanding multimodal scenes, thereby improving the model's robustness. Furthermore, to augment scene understanding and the out-of-distribution performance, the contrastive knowledge distillation module introduces a series of implicit knowledge. We distill knowledge into attributes through contrastive loss, which further strengthens the representation learning of attribute features and facilitates visual-linguistic alignment. Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method.

dataset, information, knowledge, (17 more...)

arXiv.org Artificial Intelligence

2401.09442

Country:

Asia > China > Jilin Province > Changchun (0.04)
Asia > China > Beijing > Beijing (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.75)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.56)

Add feedback

SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing

Wang, Zhecheng, Prabha, Rajanie, Huang, Tianyuan, Wu, Jiajun, Rajagopal, Ram

arXiv.org Artificial IntelligenceDec-20-2023

Remote sensing imagery, despite its broad applications in helping achieve Sustainable Development Goals and tackle climate change, has not yet benefited from the recent advancements of versatile, task-agnostic vision language models (VLMs). A key reason is that the large-scale, semantically diverse image-text dataset required for developing VLMs is still absent for remote sensing images. Unlike natural images, remote sensing images and their associated text descriptions cannot be efficiently collected from the public Internet at scale. In this work, we bridge this gap by using geo-coordinates to automatically connect open, unlabeled remote sensing images with rich semantics covered in OpenStreetMap, and thus construct SkyScript, a comprehensive vision-language dataset for remote sensing images, comprising 2.6 million image-text pairs covering 29K distinct semantic tags. With continual pre-training on this dataset, we obtain a VLM that surpasses baseline models with a 6.2% average accuracy gain in zero-shot scene classification across seven benchmark datasets. It also demonstrates the ability of zero-shot transfer for fine-grained object attribute classification and cross-modal retrieval. We hope this dataset can support the advancement of VLMs for various multi-modal tasks in remote sensing, such as open-vocabulary classification, retrieval, captioning, and text-to-image synthesis.

classification, dataset, skyscript, (17 more...)

arXiv.org Artificial Intelligence

2312.12856

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Spain (0.04)
Europe > Finland (0.04)
(6 more...)

Genre: Research Report (0.65)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.66)

Add feedback

Benchmarks for Physical Reasoning AI

Melnik, Andrew, Schiewer, Robin, Lange, Moritz, Muresanu, Andrei, Saeidi, Mozhgan, Garg, Animesh, Ritter, Helge

arXiv.org Artificial IntelligenceDec-17-2023

Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. Therefore, we aim to offer an overview of existing benchmarks and their solution approaches and propose a unified perspective for measuring the physical reasoning capacity of AI systems. We select benchmarks that are designed to test algorithmic performance in physical reasoning tasks. While each of the selected benchmarks poses a unique challenge, their ensemble provides a comprehensive proving ground for an AI generalist agent with a measurable skill level for various physical reasoning concepts. This gives an advantage to such an ensemble of benchmarks over other holistic benchmarks that aim to simulate the real world by intertwining its complexity and many concepts. We group the presented set of physical reasoning benchmarks into subcategories so that more narrow generalist AI agents can be tested first on these groups.

agent, arxiv preprint arxiv, benchmark, (15 more...)

arXiv.org Artificial Intelligence

2312.10728

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Italy (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback