AITopics | Object-Oriented Architecture

Collaborating Authors

Object-Oriented Architecture

News Overviews Instructional Materials AI-Alerts Classics

Inferring Pluggable Types with Machine Learning

Siddiqui, Kazi Amanul Islam, Kellogg, Martin

arXiv.org Artificial IntelligenceJun-21-2024

Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the best performance, with a recall of .89 and precision of 0.6. Furthermore, we conduct a study to estimate the number of Java classes needed for good performance of the trained model. For our feasibility study, performance improved around 16k classes, and deteriorated due to overfitting around 22k classes.

annotation, node, type system, (15 more...)

arXiv.org Artificial Intelligence

2406.15676

Country:

North America > United States > New Jersey (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(10 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.67)

Add feedback

Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning

Jiang, Zhuohang, Tong, Bingkui, Du, Xia, Alhammadi, Ahmed, Zhou, Jizhe

arXiv.org Artificial IntelligenceJun-18-2024

The Privacy-sensitive Object Identification (POI) task allocates bounding boxes for privacy-sensitive objects in a scene. The key to POI is settling an object's privacy class (privacy-sensitive or non-privacy-sensitive). In contrast to conventional object classes which are determined by the visual appearance of an object, one object's privacy class is derived from the scene contexts and is subject to various implicit factors beyond its visual appearance. That is, visually similar objects may be totally opposite in their privacy classes. To explicitly derive the objects' privacy class from the scene contexts, in this paper, we interpret the POI task as a visual reasoning task aimed at the privacy of each object in the scene. Following this interpretation, we propose the PrivacyGuard framework for POI. PrivacyGuard contains three stages. i) Structuring: an unstructured image is first converted into a structured, heterogeneous scene graph that embeds rich scene contexts. ii) Data Augmentation: a contextual perturbation oversampling strategy is proposed to create slightly perturbed privacy-sensitive objects in a scene graph, thereby balancing the skewed distribution of privacy classes. iii) Hybrid Graph Generation & Reasoning: the balanced, heterogeneous scene graph is then transformed into a hybrid graph by endowing it with extra "node-node" and "edge-edge" homogeneous paths. These homogeneous paths allow direct message passing between nodes or edges, thereby accelerating reasoning and facilitating the capturing of subtle context changes. Based on this hybrid graph... **For the full abstract, see the original paper.**

dataset, graph, information, (16 more...)

arXiv.org Artificial Intelligence

2406.12736

Country: Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Xiao, Yubin, Wang, Di, Wu, Xuan, Wu, Yuesong, Li, Boyang, Du, Wei, Wang, Liupu, Zhou, You

arXiv.org Artificial IntelligenceJun-17-2024

Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically, we propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder to enhance the size and distribution generalization, respectively. ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes. The DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space to encompass a broader range of distributional scenarios. We conduct extensive experiments on both synthetic and widely recognized real-world benchmarking datasets and compare the performance with seven baseline models. The results demonstrate the effectiveness of using ESF and DS decoder to obtain a more generalizable model and showcase their applicability to solve different VRP variants, i.e., travelling salesman problem and capacitated VRP. Notably, our proposed generic components require minimal computational resources, and can be effortlessly integrated into conventional generalization strategies to further elevate model generalization.

decoder, distribution pattern, generalization, (13 more...)

arXiv.org Artificial Intelligence

2406.06652

Country:

Asia > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Transportation > Freight & Logistics Services (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

Wang, Letian, Kim, Seung Wook, Yang, Jiawei, Yu, Cunjun, Ivanovic, Boris, Waslander, Steven L., Wang, Yue, Fidler, Sanja, Pavone, Marco, Karkus, Peter

arXiv.org Artificial IntelligenceJun-17-2024

We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in autonomous driving. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs, and is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images. Our first insight is to exploit per-scene optimized Neural Radiance Fields (NeRFs) by generating dense depth and virtual camera targets for training, thereby helping our model to learn 3D geometry from sparse non-overlapping image inputs. Second, to learn a semantically rich 3D representation, we propose distilling features from pre-trained 2D foundation models, such as CLIP or DINOv2, thereby enabling various downstream tasks without the need for costly 3D human annotations. To leverage these two insights, we introduce a novel model architecture with a two-stage lift-splat-shoot encoder and a parameterized sparse hierarchical voxel representation. Experimental results on the NuScenes dataset demonstrate that DistillNeRF significantly outperforms existing comparable self-supervised methods for scene reconstruction, novel view synthesis, and depth estimation; and it allows for competitive zero-shot 3D semantic occupancy prediction, as well as open-world scene understanding through distilled foundation model features. Demos and code will be available at https://distillnerf.github.io/.

prediction, proceedings, representation, (9 more...)

arXiv.org Artificial Intelligence

2406.12095

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.84)

Industry:

Information Technology (0.35)
Transportation > Ground > Road (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.34)

Add feedback

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Binyamin, Lital, Tewel, Yoad, Segev, Hilit, Hirsch, Eran, Rassin, Royi, Chechik, Gal

arXiv.org Artificial IntelligenceJun-14-2024

Despite the unprecedented success of text-to-image diffusion models, controlling the number of depicted objects using text is surprisingly hard. This is important for various applications from technical documents, to children's books to illustrating cooking recipes. Generating object-correct counts is fundamentally challenging because the generative model needs to keep a sense of separate identity for every instance of the object, even if several objects look identical or overlap, and then carry out a global computation implicitly during generation. It is still unknown if such representations exist. To address count-correct generation, we first identify features within the diffusion model that can carry the object identity information. We then use them to separate and count instances of objects during the denoising process and detect over-generation and under-generation. We fix the latter by training a model that predicts both the shape and location of a missing object, based on the layout of existing ones, and show how it can be used to guide denoising with correct object count. Our approach, CountGen, does not depend on external source to determine object layout, but rather uses the prior from the diffusion model itself, creating prompt-dependent and seed-dependent layouts. Evaluated on two benchmark datasets, we find that CountGen strongly outperforms the count-accuracy of existing baselines.

countgen, diffusion model, layout, (16 more...)

arXiv.org Artificial Intelligence

2406.1021

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.87)

Add feedback

3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation

Liu, Xinyu, Zhang, Jing, Zhang, Kexin, Yang, Yuting, Jiao, Licheng, Yang, Shuyuan

arXiv.org Artificial IntelligenceJun-5-2024

Motion Expression guided Video Segmentation (MeViS) Track is designed to advance the study of natural languageguided Video Object Segmentation (VOS) is a vital task in computer video understanding in complex environments, with vision, focusing on distinguishing foreground objects the goal of fostering the development of a more comprehensive from the background across video frames. Our work draws and robust pixel-level understanding of video scenes in inspiration from the Cutie model, and we investigate the effects such settings and realistic scenarios through the inclusion of of object memory, the total number of memory frames, new videos, sentences, and annotations [7].

segmentation, video, video object segmentation, (14 more...)

arXiv.org Artificial Intelligence

2406.03668

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.36)

Add feedback

Precision and Adaptability of YOLOv5 and YOLOv8 in Dynamic Robotic Environments

Kich, Victor A., Muttaqien, Muhammad A., Toyama, Junya, Miyoshi, Ryutaro, Ida, Yosuke, Ohya, Akihisa, Date, Hisashi

arXiv.org Artificial IntelligenceJun-1-2024

Recent advancements in real-time object detection frameworks have spurred extensive research into their application in robotic systems. This study provides a comparative analysis of YOLOv5 and YOLOv8 models, challenging the prevailing assumption of the latter's superiority in performance metrics. Contrary to initial expectations, YOLOv5 models demonstrated comparable, and in some cases superior, precision in object detection tasks. Our analysis delves into the underlying factors contributing to these findings, examining aspects such as model architecture complexity, training dataset variances, and real-world applicability. Through rigorous testing and an ablation study, we present a nuanced understanding of each model's capabilities, offering insights into the selection and optimization of object detection frameworks for robotic applications. Implications of this research extend to the design of more efficient and contextually adaptive systems, emphasizing the necessity for a holistic approach to evaluating model performance.

artificial intelligence, machine learning, object-oriented architecture, (13 more...)

arXiv.org Artificial Intelligence

2406.00315

Country:

Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.11)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.49)

Add feedback

Text Guided Image Editing with Automatic Concept Locating and Forgetting

Li, Jia, Hu, Lijie, He, Zhixian, Zhang, Jingfeng, Zheng, Tianhang, Wang, Di

arXiv.org Artificial IntelligenceMay-30-2024

With the advancement of image-to-image diffusion models guided by text, significant progress has been made in image editing. However, a persistent challenge remains in seamlessly incorporating objects into images based on textual instructions, without relying on extra user-provided guidance. Text and images are inherently distinct modalities, bringing out difficulties in fully capturing the semantic intent conveyed through language and accurately translating that into the desired visual modifications. Therefore, text-guided image editing models often produce generations with residual object attributes that do not fully align with human expectations. To address this challenge, the models should comprehend the image content effectively away from a disconnect between the provided textual editing prompts and the actual modifications made to the image. In our paper, we propose a novel method called Locate and Forget (LaF), which effectively locates potential target concepts in the image for modification by comparing the syntactic trees of the target prompt and scene descriptions in the input image, intending to forget their existence clues in the generated image. Compared to the baselines, our method demonstrates its superiority in text-guided image editing tasks both qualitatively and quantitatively.

arxiv preprint arxiv, diffusion model, editing, (15 more...)

arXiv.org Artificial Intelligence

2405.19708

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Missouri (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.48)

Industry: Media > Photography (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Cross-Category Functional Grasp Tansfer

Wu, Rina, Zhu, Tianqiang, Lin, Xiangbo, Sun, Yi

arXiv.org Artificial IntelligenceMay-20-2024

Generating grasps for a dexterous hand often requires numerous grasping annotations. However, annotating high DoF dexterous hand poses is quite challenging. Especially for functional grasps, the grasp pose must be convenient for subsequent manipulation tasks. This prompt us to explore how people achieve manipulations on new objects based on past grasp experiences. We find that when grasping new items, people are adept at discovering and leveraging various similarities between objects, including shape, layout, and grasp type. Considering this, we analyze and collect grasp-related similarity relationships among 51 common tool-like object categories and annotate semantic grasp representation for 1768 objects. These objects are connected through similarities to form a knowledge graph, which helps infer our proposed cross-category functional grasp synthesis. Through extensive experiments, we demonstrate that the grasp-related knowledge indeed contributed to achieving functional grasp transfer across unknown or entirely new categories of objects. We will publicly release the dataset and code to facilitate future research.

category, functional grasp, similarity, (16 more...)

arXiv.org Artificial Intelligence

2405.0831

Country:

Asia > China > Liaoning Province > Dalian (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.67)

Add feedback

Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy

Wu, Tianhao, Gan, Yunchong, Wu, Mingdong, Cheng, Jingbo, Yang, Yaodong, Zhu, Yixin, Dong, Hao

arXiv.org Artificial IntelligenceMay-5-2024

In real-world scenarios, objects often require repositioning and reorientation before they can be grasped, a process known as pre-grasp manipulation. Learning universal dexterous functional pre-grasp manipulation requires precise control over the relative position, orientation, and contact between the hand and object while generalizing to diverse dynamic scenarios with varying objects and goal poses. To address this challenge, we propose a teacher-student learning approach that utilizes a novel mutual reward, incentivizing agents to optimize three key criteria jointly. Additionally, we introduce a pipeline that employs a mixture-of-experts strategy to learn diverse manipulation policies, followed by a diffusion policy to capture complex action distributions from these experts. Our method achieves a success rate of 72.6\% across more than 30 object categories by leveraging extrinsic dexterity and adjusting from feedback.

dexterous functional pre-grasp manipulation, manipulation, pre-grasp manipulation, (13 more...)

arXiv.org Artificial Intelligence

2403.12421

Country: Asia > China > Beijing > Beijing (0.05)

Genre: Research Report (0.40)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.35)

Add feedback