AITopics | Object-Oriented Architecture

Collaborating Authors

Object-Oriented Architecture

News Overviews Instructional Materials AI-Alerts Classics

Human Gaze Boosts Object-Centered Representation Learning

Schaumlöffel, Timothy, Aubret, Arthur, Roig, Gemma, Triesch, Jochen

arXiv.org Artificial IntelligenceJan-6-2025

Recent self-supervised learning (SSL) models trained on human-like egocentric visual inputs substantially underperform on image recognition tasks compared to humans. These models train on raw, uniform visual inputs collected from head-mounted cameras. This is different from humans, as the anatomical structure of the retina and visual cortex relatively amplifies the central visual information, i.e. around humans' gaze location. This selective amplification in humans likely aids in forming object-centered visual representations. Here, we investigate whether focusing on central visual information boosts egocentric visual object learning. We simulate 5-months of egocentric visual experience using the large-scale Ego4D dataset and generate gaze locations with a human gaze prediction model. To account for the importance of central vision in humans, we crop the visual area around the gaze location. Finally, we train a time-based SSL model on these modified inputs. Our experiments demonstrate that focusing on central vision leads to better object-centered representations. Our analysis shows that the SSL model leverages the temporal dynamics of the gaze movements to build stronger visual representations. Overall, our work marks a significant step toward bio-inspired learning of visual representations.

artificial intelligence, machine learning, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2501.02966

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Materials > Chemicals > Industrial Gases > Liquified Gas (0.93)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.93)
Energy > Oil & Gas > Midstream (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

Attribute-Based Robotic Grasping with Data-Efficient Adaptation

Yang, Yang, Yu, Houjian, Lou, Xibai, Liu, Yuanhao, Choi, Changhyun

arXiv.org Artificial IntelligenceJan-3-2025

Robotic grasping is one of the most fundamental robotic manipulation tasks and has been the subject of extensive research. However, swiftly teaching a robot to grasp a novel target object in clutter remains challenging. This paper attempts to address the challenge by leveraging object attributes that facilitate recognition, grasping, and rapid adaptation to new domains. In this work, we present an end-to-end encoder-decoder network to learn attribute-based robotic grasping with data-efficient adaptation capability. We first pre-train the end-to-end model with a variety of basic objects to learn generic attribute representation for recognition and grasping. Our approach fuses the embeddings of a workspace image and a query text using a gated-attention mechanism and learns to predict instance grasping affordances. To train the joint embedding space of visual and textual attributes, the robot utilizes object persistence before and after grasping. Our model is self-supervised in a simulation that only uses basic objects of various colors and shapes but generalizes to novel objects in new environments. To further facilitate generalization, we propose two adaptation methods, adversarial adaption and one-grasp adaptation. Adversarial adaptation regulates the image encoder using augmented data of unlabeled images, whereas one-grasp adaptation updates the overall end-to-end model using augmented data from one grasp trial. Both adaptation methods are data-efficient and considerably improve instance grasping performance. Experimental results in both simulation and the real world demonstrate that our approach achieves over 81% instance grasping success rate on unknown objects, which outperforms several baselines by large margins.

artificial intelligence, machine learning, object-oriented architecture, (19 more...)

arXiv.org Artificial Intelligence

doi: 0.1109/TRO.2024.3353484

2501.02149

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.67)

Add feedback

Interacted Object Grounding in Spatio-Temporal Human-Object Interactions

Liu, Xiaoyang, Wen, Boran, Liu, Xinpeng, Zhou, Zizheng, Fan, Hongwei, Lu, Cewu, Ma, Lizhuang, Chen, Yulong, Li, Yong-Lu

arXiv.org Artificial IntelligenceDec-27-2024

Spatio-temporal Human-Object Interaction (ST-HOI) understanding aims at detecting HOIs from videos, which is crucial for activity understanding. However, existing whole-body-object interaction video benchmarks overlook the truth that open-world objects are diverse, that is, they usually provide limited and predefined object classes. Therefore, we introduce a new open-world benchmark: Grounding Interacted Objects (GIO) including 1,098 interacted objects class and 290K interacted object boxes annotation. Accordingly, an object grounding task is proposed expecting vision systems to discover interacted objects. Even though today's detectors and grounding methods have succeeded greatly, they perform unsatisfactorily in localizing diverse and rare objects in GIO. This profoundly reveals the limitations of current vision systems and poses a great challenge. Thus, we explore leveraging spatio-temporal cues to address object grounding and propose a 4D question-answering framework (4D-QA) to discover interacted objects from diverse videos. Our method demonstrates significant superiority in extensive experiments compared to current baselines. Data and code will be publicly available at https://github.com/DirtyHarryLYL/HAKE-AVA.

detection, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.19542

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Grasp What You Want: Embodied Dexterous Grasping System Driven by Your Voice

Li, Junliang, Ye, Kai, Kang, Haolan, Liang, Mingxuan, Wu, Yuhang, Liu, Zhenhua, Zhuang, Huiping, Huang, Rui, Chen, Yongquan

arXiv.org Artificial IntelligenceDec-14-2024

In recent years, as robotics has advanced, human-robot collaboration has gained increasing importance. However, current robots struggle to fully and accurately interpret human intentions from voice commands alone. Traditional gripper and suction systems often fail to interact naturally with humans, lack advanced manipulation capabilities, and are not adaptable to diverse tasks, especially in unstructured environments. This paper introduces the Embodied Dexterous Grasping System (EDGS), designed to tackle object grasping in cluttered environments for human-robot interaction. We propose a novel approach to semantic-object alignment using a Vision-Language Model (VLM) that fuses voice commands and visual information, significantly enhancing the alignment of multi-dimensional attributes of target objects in complex scenarios. Inspired by human hand-object interactions, we develop a robust, precise, and efficient grasping strategy, incorporating principles like the thumb-object axis, multi-finger wrapping, and fingertip interaction with an object's contact mechanics. We also design experiments to assess Referring Expression Representation Enrichment (RERE) in referring expression segmentation, demonstrating that our system accurately detects and matches referring expressions. Extensive experiments confirm that EDGS can effectively handle complex grasping tasks, achieving stability and high success rates, highlighting its potential for further development in the field of Embodied AI.

information, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.10694

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.66)

Add feedback

Active Learning with Context Sampling and One-vs-Rest Entropy for Semantic Segmentation

Wu, Fei, Marquez-Neila, Pablo, Rafi-Tarii, Hedyeh, Sznitman, Raphael

arXiv.org Artificial IntelligenceDec-9-2024

Multi-class semantic segmentation remains a cornerstone challenge in computer vision. Yet, dataset creation remains excessively demanding in time and effort, especially for specialized domains. Active Learning (AL) mitigates this challenge by selecting data points for annotation strategically. However, existing patch-based AL methods often overlook boundary pixels critical information, essential for accurate segmentation. We present OREAL, a novel patch-based AL method designed for multi-class semantic segmentation. OREAL enhances boundary detection by employing maximum aggregation of pixel-wise uncertainty scores. Additionally, we introduce one-vs-rest entropy, a novel uncertainty score function that computes class-wise uncertainties while achieving implicit class balancing during dataset creation. Comprehensive experiments across diverse datasets and model architectures validate our hypothesis.

artificial intelligence, machine learning, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2412.0647

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.48)

Add feedback

DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo

Zhu, Junzhe, Ju, Yuanchen, Zhang, Junyi, Wang, Muhan, Yuan, Zhecheng, Hu, Kaizhe, Xu, Huazhe

arXiv.org Artificial IntelligenceDec-6-2024

Circles represent the contact points in the human demo / grasping points for robot manipulation. Dense 3D correspondence can enhance robotic manipulation by enabling the generalization of spatial, functional, and dynamic information from one object to an unseen counterpart. Compared to shape correspondence, semantic correspondence is more effective in generalizing across different object categories. DenseMatcher first computes vertex features by projecting multiview 2D features onto meshes and refining them with a 3D network, and subsequently finds dense correspondences with the obtained features using functional map. In addition, we craft the first 3D matching dataset that contains colored object meshes across diverse categories. In our experiments, we show that DenseMatcher significantly outperforms prior 3D matching baselines by 43.5%. We demonstrate the downstream effectiveness of DenseMatcher in (i) robotic manipulation, where it achieves crossinstance and cross-category generalization on long-horizon complex manipulation tasks from observing only one demo; (ii) zero-shot color mapping between digital assets, where appearance can be transferred between different objects with relatable geometry. Correspondence plays a pivotal role in robotics Wang (2019). For instance, in robotic assembly, it is necessary to determine the corresponding parts between the target and source objects.

artificial intelligence, machine learning, object-oriented architecture, (17 more...)

arXiv.org Artificial Intelligence

2412.05268

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Switzerland (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Deep Learning and Machine Learning: Advancing Big Data Analytics and Management with Design Patterns

Chen, Keyu, Bi, Ziqian, Wang, Tianyang, Wen, Yizhu, Feng, Pohsun, Niu, Qian, Liu, Junyu, Peng, Benji, Zhang, Sen, Li, Ming, Pan, Xuanhe, Xu, Jiawei, Wang, Jinlang, Liu, Ming

arXiv.org Artificial IntelligenceDec-6-2024

This book, Design Patterns in Machine Learning and Deep Learning: Advancing Big Data Analytics Management, presents a comprehensive study of essential design patterns tailored for large-scale machine learning and deep learning applications. The book explores the application of classical software engineering patterns, Creational, Structural, Behavioral, and Concurrency Patterns, to optimize the development, maintenance, and scalability of big data analytics systems. Through practical examples and detailed Python implementations, it bridges the gap between traditional object-oriented design patterns and the unique demands of modern data analytics environments. Key design patterns such as Singleton, Factory, Observer, and Strategy are analyzed for their impact on model management, deployment strategies, and team collaboration, providing invaluable insights into the engineering of efficient, reusable, and flexible systems. This volume is an essential resource for developers, researchers, and engineers aiming to enhance their technical expertise in both machine learning and software design.

data mining, machine learning, object-oriented architecture, (19 more...)

arXiv.org Artificial Intelligence

2410.03795

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(8 more...)

Genre: Workflow (1.00)

Industry:

Banking & Finance (0.92)
Information Technology > Security & Privacy (0.46)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Object-Oriented Programming

Wang, Tianyang, Bi, Ziqian, Chen, Keyu, Xu, Jiawei, Niu, Qian, Liu, Junyu, Peng, Benji, Li, Ming, Zhang, Sen, Pan, Xuanhe, Wang, Jinlang, Feng, Pohsun, Wen, Yizhu, Liu, Ming

arXiv.org Artificial IntelligenceDec-6-2024

Object-Oriented Programming (OOP) has become a crucial paradigm for managing the growing complexity of modern software systems, particularly in fields like machine learning, deep learning, large language models (LLM), and data analytics. This work provides a comprehensive introduction to the integration of OOP techniques within these domains, with a focus on improving code modularity, maintainability, and scalability. We begin by outlining the evolution of computing and the rise of OOP, followed by an in-depth discussion of key OOP principles such as encapsulation, inheritance, polymorphism, and abstraction. The practical application of these principles is demonstrated using Python, a widely adopted language in AI and data science. Furthermore, we examine how design patterns and modular programming can be employed to enhance the structure and efficiency of machine learning systems. In subsequent sections, we apply these OOP concepts to real-world AI tasks, including the encapsulation of preprocessing workflows, machine learning model training, and evaluation. Detailed examples illustrate how OOP can be used to build reusable, scalable machine learning systems while maintaining code clarity and reducing redundancy.This work is intended to serve as a bridge for both beginners and experienced developers, equipping them with the necessary knowledge to apply OOP methodologies in AI-driven projects, ultimately fostering the development of more robust and maintainable systems.

machine learning, object-oriented architecture, python, (19 more...)

arXiv.org Artificial Intelligence

2409.19916

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Hawaii (0.04)
(9 more...)

Genre:

Instructional Material (0.67)
Research Report (0.64)
Workflow (0.48)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.68)
Automobiles & Trucks > Manufacturer (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models

Yue, Conghan, Peng, Zhengwei, Du, Shiyan, Ji, Zhi, Cai, Chuangjian, Wan, Le, Zhang, Dongyu

arXiv.org Artificial IntelligenceDec-5-2024

While many diffusion models perform well when controlling for particular aspect among style, character, and interaction, they struggle with fine-grained control due to dataset limitations and intricate model architecture design. This paper introduces a novel algorithm, Aggregation of Multiple Diffusion Models (AMDM), which synthesizes features from multiple diffusion models into a specified model, activating specific features for fine-grained control. Experimental results demonstrate that AMDM significantly improves fine-grained control without training, proving its effectiveness. Additionally, it reveals that diffusion models initially focus on features such as position, attributes, and style, with later stages improving generation quality and consistency. AMDM offers a new perspective for tackling the challenges of fine-grained conditional control generation in diffusion models: We can fully utilize existing or develop new conditional diffusion models that control specific aspects, and then aggregate them using AMDM algorithm. This eliminates the need for constructing complex datasets, designing intricate model architectures, and incurring high training costs. Code is available at: https://github.com/Hammour-steak/AMDM.

algorithm, diffusion model, submission and formatting instruction, (10 more...)

arXiv.org Artificial Intelligence

2410.01262

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wang, Wenbo, Wei, Fangyun, Zhou, Lei, Chen, Xi, Luo, Lin, Yi, Xiaohan, Zhang, Yizhong, Liang, Yaobo, Xu, Chang, Lu, Yan, Yang, Jiaolong, Guo, Baining

arXiv.org Artificial IntelligenceDec-3-2024

We introduce UniGraspTransformer, a universal Transformer-based network for dexterous robotic grasping that simplifies training while enhancing scalability and performance. Unlike prior methods such as UniDexGrasp++, which require complex, multi-step training pipelines, UniGraspTransformer follows a streamlined process: first, dedicated policy networks are trained for individual objects using reinforcement learning to generate successful grasp trajectories; then, these trajectories are distilled into a single, universal network. Our approach enables UniGraspTransformer to scale effectively, incorporating up to 12 self-attention blocks for handling thousands of objects with diverse poses. Additionally, it generalizes well to both idealized and real-world inputs, evaluated in state-based and vision-based settings. Notably, UniGraspTransformer generates a broader range of grasping poses for objects in various shapes and orientations, resulting in more diverse grasp strategies. Experimental results demonstrate significant improvements over state-of-the-art, UniDexGrasp++, across various object categories, achieving success rate gains of 3.5%, 7.7%, and 10.1% on seen objects, unseen objects within seen categories, and completely unseen objects, respectively, in the vision-based setting. Project page: https://dexhand.github.io/UniGraspTransformer.

machine learning, object-oriented architecture, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2412.02699

Country:

Asia > Singapore (0.04)
Asia > China > Shandong Province > Dongying (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.48)

Add feedback