AITopics | Object-Oriented Architecture

Collaborating Authors

Object-Oriented Architecture

News Overviews Instructional Materials AI-Alerts Classics

SegVec3D: A Method for Vector Embedding of 3D Objects Oriented Towards Robot manipulation

arXiv.org Artificial IntelligenceJul-15-2025

However, due to their inherent sparsity, disorder, and lack of structure, instance-level semantic understanding of point clouds remains challenging - particularly under conditions of limited supervision and cross-modal semantic ambiguity. To address these issues, we propose SegV ec3D, a novel framework integrating attention mechanisms, embedding learning, and cross-modal alignment techniques for 3D point cloud instance segmentation. The proposed approach first builds a hierarchical instance feature extractor based on spatial adjacency and attention computation, enhancing the model's ability to capture fine-grained geometric structures. It then introduces a high-dimensional embedding space, enabling unsupervised instance segmentation through a contrastive-learning-based clustering mechanism. Furthermore, a shared cross-modal semantic space is constructed to align 3D data with natural language descriptions, allowing zero-shot understanding and retrieval of 3D objects given text queries. The model is ultimately deployed and validated in realistic scenarios, demonstrating strong generalizability and engineering feasibility. While recent methods like Mask3D [40] and ULIP [10][11] have advanced 3D segmentation and vision-language pre-training respectively, our approach uniquely integrates these domains by enabling instance segmentation with minimal labeling and directly aligning point clouds with language. Experimental evaluations confirm that the proposed method achieves high semantic discriminability, robust multi-modal alignment, and practical deployabil-ity. It supports weakly-supervised or unsupervised 3D instance understanding, providing a promising foundation for future multi-modal cognitive robotic systems.

machine learning, natural language, object-oriented architecture, (13 more...)

arXiv.org Artificial Intelligence

2507.09459

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

ObjectRL: An Object-Oriented Reinforcement Learning Codebase

Baykal, Gulcin, Akgül, Abdullah, Haussmann, Manuel, Tasdighi, Bahareh, Werge, Nicklas, Wu, Yi-Shan, Kandemir, Melih

arXiv.org Artificial IntelligenceJul-8-2025

ObjectRL is an open-source Python codebase for deep reinforcement learning (RL), designed for research-oriented prototyping with minimal programming effort. Unlike existing codebases, ObjectRL is built on Object-Oriented Programming (OOP) principles, providing a clear structure that simplifies the implementation, modification, and evaluation of new algorithms. ObjectRL lowers the entry barrier for deep RL research by organizing best practices into explicit, clearly separated components, making them easier to understand and adapt. Each algorithmic component is a class with attributes that describe key RL concepts and methods that intuitively reflect their interactions. The class hierarchy closely follows common ontological relationships, enabling data encapsulation, inheritance, and polymorphism, which are core features of OOP. We demonstrate the efficiency of ObjectRL's design through representative use cases that highlight its flexibility and suitability for rapid prototyping. The documentation and source code are available at https://objectrl.readthedocs.io and https://github.com/adinlab/objectrl .

machine learning, object-oriented architecture, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2507.03487

Country:

Europe > Portugal > Braga > Braga (0.04)
Europe > Denmark > Southern Denmark (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Education > Curriculum > Subject-Specific Education (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ArtGS:3D Gaussian Splatting for Interactive Visual-Physical Modeling and Manipulation of Articulated Objects

Yu, Qiaojun, Yuan, Xibin, jiang, Yu, Chen, Junting, Zheng, Dongzhe, Hao, Ce, You, Yang, Chen, Yixing, Mu, Yao, Liu, Liu, Lu, Cewu

arXiv.org Artificial IntelligenceJul-4-2025

Articulated object manipulation remains a critical challenge in robotics due to the complex kinematic constraints and the limited physical reasoning of existing methods. In this work, we introduce ArtGS, a novel framework that extends 3D Gaussian Splatting (3DGS) by integrating visual-physical modeling for articulated object understanding and interaction. ArtGS begins with multi-view RGB-D reconstruction, followed by reasoning with a vision-language model (VLM) to extract semantic and structural information, particularly the articulated bones. Through dynamic, differentiable 3DGS-based rendering, ArtGS optimizes the parameters of the articulated bones, ensuring physically consistent motion constraints and enhancing the manipulation policy. By leveraging dynamic Gaussian splatting, cross-embodiment adaptability, and closed-loop optimization, ArtGS establishes a new framework for efficient, scalable, and generalizable articulated object modeling and manipulation. Experiments conducted in both simulation and real-world environments demonstrate that ArtGS significantly outperforms previous methods in joint estimation accuracy and manipulation success rates across a variety of articulated objects. Additional images and videos are available on the project website: https://sites.google.com/view/artgs/home

artificial intelligence, manipulation, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2507.026

Country: Asia > China (0.29)

Genre: Research Report (0.82)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.35)

Add feedback

LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans

Huang, Zhening, Wu, Xiaoyang, Zhong, Fangcheng, Zhao, Hengshuang, Nießner, Matthias, Lasenby, Joan

arXiv.org Artificial IntelligenceJul-4-2025

We propose LiteReality, a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic, and interactive 3D virtual replicas. LiteReality not only reconstructs scenes that visually resemble reality but also supports key features essential for graphics pipelines -- such as object individuality, articulation, high-quality physically based rendering materials, and physically based interaction. At its core, LiteReality first performs scene understanding and parses the results into a coherent 3D layout and objects with the help of a structured scene graph. It then reconstructs the scene by retrieving the most visually similar 3D artist-crafted models from a curated asset database. Next, the Material Painting module enhances realism by recovering high-quality, spatially varying materials. Finally, the reconstructed scene is integrated into a simulation engine with basic physical properties to enable interactive behavior. The resulting scenes are compact, editable, and fully compatible with standard graphics pipelines, making them suitable for applications in AR/VR, gaming, robotics, and digital twins. In addition, LiteReality introduces a training-free object retrieval module that achieves state-of-the-art similarity performance on the Scan2CAD benchmark, along with a robust material painting module capable of transferring appearances from images of any style to 3D assets -- even under severe misalignment, occlusion, and poor lighting. We demonstrate the effectiveness of LiteReality on both real-life scans and public datasets. Project page: https://litereality.github.io; Video: https://www.youtube.com/watch?v=ecK9m3LXg2c

machine learning, natural language, object-oriented architecture, (21 more...)

arXiv.org Artificial Intelligence

2507.02861

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Teaching Programming in the Age of Generative AI: Insights from Literature, Pedagogical Proposals, and Student Perspectives

Rubio-Manzano, Clemente, Meza, Jazna, Fernandez-Santibanez, Rodolfo, Vidal-Castro, Christian

arXiv.org Artificial IntelligenceJul-2-2025

Computer programming is undergoing a true transformation driven by powerful new tools for automatic source code generation based on large language models. This transformation is also manifesting in introductory programming courses at universities around the world, generating an in-depth debate about how programming content should be taught, learned, and assessed in the context of generative artificial intelligence. This article aims, on the one hand, to review the most relevant studies on this issue, highlighting the advantages and disadvantages identified in the specialized literature. On the other hand, it proposes enriching teaching and learning methodologies by focusing on code comprehension and execution rather than on mere coding or program functionality. In particular, it advocates for the use of visual representations of code and visual simulations of its execution as effective tools for teaching, learning, and assessing programming, thus fostering a deeper understanding among students. Finally, the opinions of students who took the object-oriented programming course are presented to provide preliminary context supporting the incorporation of visual simulations in Java (or other languages) as part of the training process.

machine learning, natural language, programming language, (20 more...)

arXiv.org Artificial Intelligence

2507.00108

Genre: Instructional Material (0.93)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.85)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.70)

Add feedback

Personalized Robotic Object Rearrangement from Scene Context

Ramachandruni, Kartik, Chernova, Sonia

arXiv.org Artificial IntelligenceJun-30-2025

Object rearrangement is a key task for household robots requiring personalization without explicit instructions, meaningful object placement in environments occupied with objects, and generalization to unseen objects and new environments. To facilitate research addressing these challenges, we introduce PARSEC, an object rearrangement benchmark for learning user organizational preferences from observed scene context to place objects in a partially arranged environment. PARSEC is built upon a novel dataset of 110K rearrangement examples crowdsourced from 72 users, featuring 93 object categories and 15 environments. To better align with real-world organizational habits, we propose ContextSortLM, an LLM-based personalized rearrangement model that handles flexible user preferences by explicitly accounting for objects with multiple valid placement locations when placing items in partially arranged environments. We evaluate ContextSortLM and existing personalized rearrangement approaches on the PARSEC benchmark and complement these findings with a crowdsourced evaluation of 108 online raters ranking model predictions based on alignment with user preferences. Our results indicate that personalized rearrangement models leveraging multiple scene context sources perform better than models relying on a single context source. Moreover, ContextSortLM outperforms other models in placing objects to replicate the target user's arrangement and ranks among the top two in all three environment categories, as rated by online evaluators. Importantly, our evaluation highlights challenges associated with modeling environment semantics across different environment categories and provides recommendations for future work.

large language model, natural language, object-oriented architecture, (22 more...)

arXiv.org Artificial Intelligence

2505.11108

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.59)
(3 more...)

Add feedback

Embodied Domain Adaptation for Object Detection

Shi, Xiangyu, Qiao, Yanyuan, Liu, Lingqiao, Dayoub, Feras

arXiv.org Artificial IntelligenceJun-30-2025

-- Mobile robots rely on object detectors for perception and object localization in indoor environments. However, standard closed-set methods struggle to handle the diverse objects and dynamic conditions encountered in real homes and labs. Open-vocabulary object detection (OVOD), driven by Vision Language Models (VLMs), extends beyond fixed labels but still struggles with domain shifts in indoor environments. We introduce a Source-Free Domain Adaptation (SFDA) approach that adapts a pre-trained model without accessing source data. We refine pseudo labels via temporal clustering, employ multi-scale threshold fusion, and apply a Mean T eacher framework with contrastive learning. Our Embodied Domain Adaptation for Object Detection (EDAOD) benchmark evaluates adaptation under sequential changes in lighting, layout, and object diversity. Our experiments show significant gains in zero-shot detection performance and flexible adaptation to dynamic indoor conditions. I. INTRODUCTION Robust object detection is pivotal for mobile robots performing tasks like semantic mapping, navigation, and object interaction in indoor environments.

adaptation, artificial intelligence, object-oriented architecture, (15 more...)

arXiv.org Artificial Intelligence

2506.2186

Genre: Research Report (0.50)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)

Add feedback

CF-VLM:CounterFactual Vision-Language Fine-tuning

Zhang, Jusheng, Cai, Kaitong, Fan, Yijia, Wang, Jian, Wang, Keze

arXiv.org Artificial IntelligenceJun-24-2025

Recent advances in vision-language models (VLMs) have greatly improved cross-modal semantic understanding, yet significant limitations remain in fine-grained discrimination and deep causal reasoning tasks. Existing VLMs often rely on superficial statistical correlations, lacking the ability to capture the underlying causal logic between visual and textual content. To address this, we propose CounterFactual Vision-Language Fine-tuning (CF-VLM), a novel framework that enhances the causal reasoning capabilities of VLMs through the targeted use of counterfactual samples. CF-VLM introduces three complementary training objectives: maintaining foundational cross-modal alignment, reinforcing the uniqueness and stability of factual scene representations against coherent counterfactuals, and sharpening the model's sensitivity to minimal but critical causal edits. Extensive experiments demonstrate that CF-VLM consistently outperforms strong baselines and state-of-the-art methods on compositional reasoning and generalization benchmarks. Furthermore, it shows promise in mitigating visual hallucinations, indicating improved factual consistency. Our CF-VLM provides a robust foundation for deploying VLMs in high-stakes, real-world scenarios requiring reliable reasoning and interpretability.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

2506.17267

Country: North America (0.46)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(3 more...)

Add feedback

MIRAGE: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence

Liu, Chonghan, Wang, Haoran, Henry, Felix, Miao, Pu, Zhang, Yajie, Zhao, Yu, Wu, Peiran

arXiv.org Artificial IntelligenceJun-24-2025

Spatial perception and reasoning are core components of human cognition, encompassing object recognition, spatial relational understanding, and dynamic reasoning. Despite progress in computer vision, existing benchmarks reveal significant gaps in models' abilities to accurately recognize object attributes and reason about spatial relationships, both essential for dynamic reasoning. To address these limitations, we propose MIRAGE, a multi-modal benchmark designed to evaluate models' capabilities in Counting (object attribute recognition), Relation (spatial relational reasoning), and Counting with Relation. Through diverse and complex scenarios requiring fine-grained recognition and reasoning, MIRAGE highlights critical limitations in state-of-the-art models, underscoring the need for improved representations and reasoning frameworks. By targeting these foundational abilities, MIRAGE provides a pathway toward spatiotemporal reasoning in future research.

large language model, natural language, object-oriented architecture, (19 more...)

arXiv.org Artificial Intelligence

2505.10604

Genre: Research Report > Experimental Study (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.86)

Add feedback

GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects

Li, Shujia, Zhang, Haiyu, Chen, Xinyuan, Wang, Yaohui, Ban, Yutong

arXiv.org Artificial IntelligenceJun-19-2025

While diffusion models and large-scale motion datasets have advanced text-driven human motion synthesis, extending these advances to 4D human-object interaction (HOI) remains challenging, mainly due to the limited availability of large-scale 4D HOI datasets. In our study, we introduce GenHOI, a novel two-stage framework aimed at achieving two key objectives: 1) generalization to unseen objects and 2) the synthesis of high-fidelity 4D HOI sequences. In the initial stage of our framework, we employ an Object-AnchorNet to reconstruct sparse 3D HOI keyframes for unseen objects, learning solely from 3D HOI datasets, thereby mitigating the dependence on large-scale 4D HOI datasets. Subsequently, we introduce a Contact-Aware Diffusion Model (ContactDM) in the second stage to seamlessly interpolate sparse 3D HOI keyframes into densely temporally coherent 4D HOI sequences. To enhance the quality of generated 4D HOI sequences, we propose a novel Contact-Aware Encoder within ContactDM to extract human-object contact patterns and a novel Contact-Aware HOI Attention to effectively integrate the contact signals into diffusion models. Experimental results show that we achieve state-of-the-art results on the publicly available OMOMO and 3D-FUTURE datasets, demonstrating strong generalization abilities to unseen objects, while enabling high-fidelity 4D HOI generation.

machine learning, natural language, object-oriented architecture, (17 more...)

arXiv.org Artificial Intelligence

2506.15483

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback