AITopics | Object-Oriented Architecture

Collaborating Authors

Object-Oriented Architecture

News Overviews Instructional Materials AI-Alerts Classics

Shape Expressions with Inheritance

Boneva, Iovka, Gayo, Jose Emilio Labra, Prud'hommeaux, Eric, Thornton, Katherine, Waagmeester, Andra

arXiv.org Artificial IntelligenceMar-31-2025

We formally introduce an inheritance mechanism for the Shape Expressions language (ShEx). It is inspired by inheritance in object-oriented programming languages, and provides similar advantages such as reuse, modularity, and more flexible data modelling. Using an example, we explain the main features of the inheritance mechanism. We present its syntax and formal semantics. The semantics is an extension of the semantics of ShEx 2.1. It also directly yields a validation algorithm as an extension of the previous ShEx validation algorithms, while maintaining the same algorithmic complexity.

artificial intelligence, object-oriented architecture, programming language, (15 more...)

arXiv.org Artificial Intelligence

2503.24299

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
(10 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)

Add feedback

VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation

Dao, Alan, Buppodom, Norapat

arXiv.org Artificial IntelligenceMar-27-2025

Comprehending 3D environments is vital for intelligent systems in domains like robotics and autonomous navigation. Voxel grids offer a structured representation of 3D space, but extracting high-level semantic meaning remains challenging. This paper proposes a novel approach utilizing a Vision-Language Model (VLM) to extract "voxel semantics"-object identity, color, and location-from voxel data. Critically, instead of employing complex 3D networks, our method processes the voxel space by systematically slicing it along a primary axis (e.g., the Z-axis, analogous to CT scan slices). These 2D slices are then formatted and sequentially fed into the image encoder of a standard VLM. The model learns to aggregate information across slices and correlate spatial patterns with semantic concepts provided by the language component. This slice-based strategy aims to leverage the power of pre-trained 2D VLMs for efficient 3D semantic understanding directly from voxel representations.

information, natural language, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2503.21214

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.87)

Add feedback

BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts

Xu, Suzhe, Peng, Jialin, Zhang, Chengyuan

arXiv.org Artificial IntelligenceMar-25-2025

Segmentation is a fundamental task in computer vision, with prompt-driven methods gaining prominence due to their flexibility. The recent Segment Anything Model (SAM) has demonstrated powerful point-prompt segmentation capabilities, while text-based segmentation models offer rich semantic understanding. However, existing approaches rarely explore how to effectively combine these complementary modalities for optimal segmentation performance. This paper presents BiPrompt-SAM, a novel dual-modal prompt segmentation framework that fuses the advantages of point and text prompts through an explicit selection mechanism. Specifically, we leverage SAM's inherent ability to generate multiple mask candidates, combined with a semantic guidance mask from text prompts, and explicitly select the most suitable candidate based on similarity metrics. This approach can be viewed as a simplified Mixture of Experts (MoE) system, where the point and text modules act as distinct "experts," and the similarity scoring serves as a rudimentary "gating network." We conducted extensive evaluations on both the Endovis17 medical dataset and RefCOCO series natural image datasets. On Endovis17, BiPrompt-SAM achieved 89.55\% mDice and 81.46\% mIoU, comparable to state-of-the-art specialized medical segmentation models. On the RefCOCO series datasets, our method attained 87.1\%, 86.5\%, and 85.8\% IoU, significantly outperforming existing approaches. Experiments demonstrate that our explicit dual-selection method effectively combines the spatial precision of point prompts with the semantic richness of text prompts, particularly excelling in scenarios involving semantically complex objects, multiple similar objects, and partial occlusions. BiPrompt-SAM not only provides a simple yet effective implementation but also offers a new perspective on multi-modal prompt fusion.

biprompt-sam, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.19769

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.98)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.89)
(2 more...)

Add feedback

Untangling the Influence of Typology, Data and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS Tagging

Rice, Enora, Marashian, Ali, Haynie, Hannah, von der Wense, Katharina, Palmer, Alexis

arXiv.org Artificial IntelligenceMar-25-2025

Cross-lingual transfer learning is an invaluable tool for overcoming data scarcity, yet selecting a suitable transfer language remains a challenge. The precise roles of linguistic typology, training data, and model architecture in transfer language choice are not fully understood. We take a holistic approach, examining how both dataset-specific and fine-grained typological features influence transfer language selection for part-of-speech tagging, considering two different sources for morphosyntactic features. While previous work examines these dynamics in the context of bilingual biLSTMS, we extend our analysis to a more modern transfer learning pipeline: zero-shot prediction with pretrained multilingual models. We train a series of transfer language ranking systems and examine how different feature inputs influence ranker performance across architectures. Word overlap, type-token ratio, and genealogical distance emerge as top features across all architectures. Our findings reveal that a combination of typological and dataset-dependent features leads to the best rankings, and that good performance can be obtained with either feature group on its own.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.19979

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(7 more...)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes

Zhang, Haochen, Zantout, Nader, Kachana, Pujith, Zhang, Ji, Wang, Wenshan

arXiv.org Artificial IntelligenceMar-20-2025

With the recent rise of large language models, vision-language models, and other general foundation models, there is growing potential for multimodal, multi-task robotics that can operate in diverse environments given natural language input. One such application is indoor navigation using natural language instructions. However, despite recent progress, this problem remains challenging due to the 3D spatial reasoning and semantic understanding required. Additionally, the language used may be imperfect or misaligned with the scene, further complicating the task. To address this challenge, we curate a benchmark dataset, IRef-VLA, for Interactive Referential Vision and Language-guided Action in 3D Scenes with imperfect references. IRef-VLA is the largest real-world dataset for the referential grounding task, consisting of over 11.5K scanned 3D rooms from existing datasets, 7.6M heuristically generated semantic relations, and 4.7M referential statements. Our dataset also contains semantic object and room annotations, scene graphs, navigable free space annotations, and is augmented with statements where the language has imperfections or ambiguities. We verify the generalizability of our dataset by evaluating with state-of-the-art models to obtain a performance baseline and also develop a graph-search baseline to demonstrate the performance bound and generation of alternatives using scene-graph knowledge. With this benchmark, we aim to provide a resource for 3D scene understanding that aids the development of robust, interactive navigation systems. The dataset and all source code is publicly released at https://github.com/HaochenZ11/IRef-VLA.

large language model, natural language, object-oriented architecture, (17 more...)

arXiv.org Artificial Intelligence

2503.17406

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.66)

Add feedback

Data Spatial Programming

Mars, Jason

arXiv.org Artificial IntelligenceMar-19-2025

We introduce a novel programming model, Data Spatial Programming, which extends the semantics of Object-Oriented Programming (OOP) by introducing new class-like constructs called archetypes. These archetypes encapsulate spatial relationships between data entities and execution flow in a structured manner, enabling more expressive and semantically rich computations over interconnected data structures. By formalizing the relationships between data elements in space, our approach allows for more intuitive modeling of complex systems where the topology of connections is essential to the underlying computational model. This paradigm addresses limitations in traditional OOP when representing dynamically evolving networks, agent-based systems, and other spatially-oriented computational problems.

large language model, natural language, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2503.15812

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Evaluating the Application of SOLID Principles in Modern AI Framework Architectures

Shrestha, Jonesh

arXiv.org Artificial IntelligenceMar-17-2025

This research evaluates the extent to which modern AI frameworks, specifically TensorFlow and scikit-learn, adhere to the SOLID design principles - Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion. Analyzing the frameworks architectural documentation and design philosophies, this research investigates architectural trade-offs when balancing software engineering best practices with AI-specific needs. I examined each frameworks documentation, source code, and architectural components to evaluate their adherence to these principles. The results show that both frameworks adopt certain aspects of SOLID design principles but make intentional trade-offs to address performance, scalability, and the experimental nature of AI development. TensorFlow focuses on performance and scalability, sometimes sacrificing strict adherence to principles like Single Responsibility and Interface Segregation. While scikit-learns design philosophy aligns more closely with SOLID principles through consistent interfaces and composition principles, sticking closer to SOLID guidelines but with occasional deviations for performance optimizations and scalability. This research discovered that applying SOLID principles in AI frameworks depends on context, as performance, scalability, and flexibility often require deviations from traditional software engineering principles. This research contributes to understanding how domain-specific constraints influence architectural decisions in modern AI frameworks and how these frameworks strategically adapted design choices to effectively balance these contradicting requirements.

artificial intelligence, machine learning, object-oriented architecture, (14 more...)

arXiv.org Artificial Intelligence

2503.13786

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Georgia > Chatham County > Savannah (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval

Wagner, Stefan Sylvius, Harmeling, Stefan

arXiv.org Artificial IntelligenceMar-15-2025

Object-centric learning is fundamental to human vision and crucial for models requiring complex reasoning. Traditional approaches rely on slot-based bottlenecks to learn object properties explicitly, while recent self-supervised vision models like DINO have shown emergent object understanding. However, DINO representations primarily capture global scene features, often confounding individual object attributes. We investigate the effectiveness of DINO representations and slot-based methods for multi-object instance retrieval. Our findings reveal that DINO representations excel at capturing global object attributes such as object shape and size, but struggle with object-level details like colour, whereas slot-based representations struggle at both global and object-level understanding. To address this, we propose a method that combines global and local features by augmenting DINO representations with object-centric latent vectors from a Variational Autoencoder trained on segmented image patches that are extracted from the DINO features. This approach improves multi-object instance retrieval performance, bridging the gap between global scene understanding and fine-grained object representation without requiring full model retraining.

artificial intelligence, machine learning, object-oriented architecture, (17 more...)

arXiv.org Artificial Intelligence

2503.09867

Country: Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CHOrD: Generation of Collision-Free, House-Scale, and Organized Digital Twins for 3D Indoor Scenes with Controllable Floor Plans and Optimal Layouts

Su, Chong, Fu, Yingbin, Hu, Zheyuan, Yang, Jing, Hanji, Param, Wang, Shaojun, Zhao, Xuan, Öztireli, Cengiz, Zhong, Fangcheng

arXiv.org Artificial IntelligenceMar-14-2025

We introduce CHOrD, a novel framework for scalable synthesis of 3D indoor scenes, designed to create house-scale, collision-free, and hierarchically structured indoor digital twins. In contrast to existing methods that directly synthesize the scene layout as a scene graph or object list, CHOrD incorporates a 2D image-based intermediate layout representation, enabling effective prevention of collision artifacts by successfully capturing them as out-of-distribution (OOD) scenarios during generation. Furthermore, unlike existing methods, CHOrD is capable of generating scene layouts that adhere to complex floor plans with multi-modal controls, enabling the creation of coherent, house-wide layouts robust to both geometric and semantic variations in room structures. Additionally, we propose a novel dataset with expanded coverage of household items and room configurations, as well as significantly improved data quality. CHOrD demonstrates state-of-the-art performance on both the 3D-FRONT and our proposed datasets, delivering photorealistic, spatially coherent indoor scene synthesis adaptable to arbitrary floor plan variations.

chord, dataset, layout, (15 more...)

arXiv.org Artificial Intelligence

2503.11958

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Robots (0.68)
(2 more...)

Add feedback

Expelled! review – turning the tables on the private school class hierarchy

The GuardianMar-12-2025, 08:00:11 GMT

As with seemingly everything in the UK, it all comes back to the class system. Verity Amersham, a scholarship student at Miss Mulligatawney's School for Promising Girls, is accused of pushing the hockey captain out of a window, and the school's fearsome headmistress is determined to expel her despite the flimsiest evidence. When Verity protests her innocence, Miss Mulligatawney remains unpersuaded, spelling out her reasoning in plain terms: as a northerner with working-class parents, Verity simply isn't the "right sort". The injustice of it all is a potent driver, ensuring I set about my goal of preventing Verity's expulsion with determined zeal, much like Matilda defying the hateful Miss Trunchbull. As in developer Inkle's 2021 game Overboard!, you're given a time limit to work within and a handful of areas to move between, from the library to the sick room (AKA the "san", where the school's grumpy matron lurks). Each area has characters to talk to and objects to find, and each action moves the clock forward.

artificial intelligence, object-oriented architecture, private school class hierarchy, (7 more...)

The Guardian

Country: Europe > United Kingdom (0.26)

Industry: Education > Educational Setting (0.87)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.40)

Add feedback