AITopics

2509.12754

Country: Asia > Japan > Honshū > Kansai (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
(3 more...)

Renz, Marian, Igelbrink, Felix, Atzmueller, Martin

Integrating Prior Observations for Incremental 3D Scene Graph Prediction

arXiv.org Artificial IntelligenceSep-16-2025

3D semantic scene graphs (3DSSG) provide compact structured representations of environments by explicitly modeling objects, attributes, and relationships. While 3DSSGs have shown promise in robotics and embodied AI, many existing methods rely mainly on sensor data, not integrating further information from semantically rich environments. Additionally, most methods assume access to complete scene reconstructions, limiting their applicability in real-world, incremental settings. This paper introduces a novel heterogeneous graph model for incremental 3DSSG prediction that integrates additional, multi-modal information, such as prior observations, directly into the message-passing process. Utilizing multiple layers, the model flexibly incorporates global and local scene representations without requiring specialized modules or full scene reconstructions. We evaluate our approach on the 3DSSG dataset, showing that GNNs enriched with multi-modal information such as semantic embeddings (e.g., CLIP) and prior observations offer a scalable and generalizable solution for complex, real-world environments. The full source code of the presented architecture will be made available at https://github.com/m4renz/incremental-scene-graph-prediction.

artificial intelligence, machine learning, object-oriented architecture, (17 more...)

2509.11895

Country: Europe > Germany > Lower Saxony (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.48)

Farrukh, Abdullah, Wagner, Achim, Ruskowski, Martin

Enabling Generic Robot Skill Implementation Using Object Oriented Programming

arXiv.org Artificial IntelligenceSep-8-2025

Developing robotic algorithms and integrating a robotic subsystem into a larger system can be a difficult task. Particularly in small and medium-sized enterprises (SMEs) where robotics expertise is lacking, implementing, maintaining and developing robotic systems can be a challenge. As a result, many companies rely on external expertise through system integrators, which, in some cases, can lead to vendor lock-in and external dependency. In the academic research on intelligent manufacturing systems, robots play a critical role in the design of robust autonomous systems. Similar challenges are faced by researchers who want to use robotic systems as a component in a larger smart system, without having to deal with the complexity and vastness of the robot interfaces in detail. In this paper, we propose a software framework that reduces the effort required to deploy a working robotic system. The focus is solely on providing a concept for simplifying the different interfaces of a modern robot system and using an abstraction layer for different manufacturers and models. The Python programming language is used to implement a prototype of the concept. The target system is a bin-picking cell containing a Yaskawa Motoman GP4.

artificial intelligence, interface, object-oriented architecture, (12 more...)

doi: 10.1007/978-3-032-02106-9_47

2508.10497

Country: Europe > Germany (0.15)

Genre: Research Report (0.83)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)

arXiv.org Artificial IntelligenceAug-27-2025

Steerable Scene Generation with Post Training and Inference-Time Search

Pfaff, Nicholas, Dai, Hongkai, Zakharov, Sergey, Iwase, Shun, Tedrake, Russ

Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement, are rare and costly to curate manually. Instead, we generate large-scale scene data using procedural models that approximate realistic environments for robotic manipulation, and adapt it to task-specific goals. We do this by training a unified diffusion-based generative model that predicts which objects to place from a fixed asset library, along with their SE(3) poses. This model serves as a flexible scene prior that can be adapted using reinforcement learning-based post training, conditional generation, or inference-time search, steering generation toward downstream objectives even when they differ from the original data distribution. Our method enables goal-directed scene synthesis that respects physical feasibility and scales across scene types. We introduce a novel MCTS-based inference-time search strategy for diffusion models, enforce feasibility via projection and simulation, and release a dataset of over 44 million SE(3) scenes spanning five diverse environments. Website with videos, code, data, and model weights: https://steerable-scene-generation.github.io/

machine learning, natural language, object-oriented architecture, (20 more...)

2505.04831

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

arXiv.org Artificial IntelligenceAug-26-2025

Few-Shot Pattern Detection via Template Matching and Regression

Jo, Eunchan, Kang, Dahyun, Kim, Sanghyun, Choi, Yunseon, Cho, Minsu

W e address the problem of few-shot pattern detection, which aims to detect all instances of a given pattern, typically represented by a few exemplars, from an input image. Although similar problems have been studied in few-shot object counting and detection (FSCD), previous methods and their benchmarks have narrowed patterns of interest to object categories and often fail to localize non-object patterns. In this work, we propose a simple yet effective detector based on template matching and regression, dubbed TMR. While previous FSCD methods typically represent target exemplars as spatially collapsed prototypes and lose structural information, we revisit classic template matching and regression. It effectively preserves and leverages the spatial layout of exemplars through a minimalistic structure with a small number of learnable convolutional or projection layers on top of a frozen backbone. W e also introduce a new dataset, dubbed RPINE, which covers a wider range of patterns than existing object-centric datasets. Our method outperforms the state-of-the-art methods on the three benchmarks, RPINE, FSCD-147, and FSCD-LVIS, and demonstrates strong generalization in cross-dataset evaluation.

exemplar, machine learning, pattern recognition, (17 more...)

2508.17636

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsAug-20-2025, 09:57:49 GMT

An ablation study over different model architectures (Table (a)) shows that the chosen

FB15k's lack of hierarchy offers no advantage to hyperbolic embeddings, but its large number MuRP does not also set out to include MTL, but we hope to address this in future work. We will include all recommendations, e.g. However, we agree that it is important to compare models across a range of dimensionalities. Note that for MuRP with biases replaced by (transformed) norms, performance reduces (e.g. Multi-relational transforms and Justification for architecture: See "Architecture ablation study".

ablation study, architecture, relation, (16 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.41)

Neural Information Processing SystemsAug-20-2025, 07:44:12 GMT

Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

Kevin Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Elizabeth Spelke, Josh Tenenbaum, Tomer Ullman

From infancy, humans have expectations about how objects will move and interact.

representation, scenario, video, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.70)
(2 more...)

arXiv.org Artificial IntelligenceAug-20-2025

The 9th AI City Challenge

Tang, Zheng, Wang, Shuo, Anastasiu, David C., Chang, Ming-Ching, Sharma, Anuj, Kong, Quan, Kobori, Norimasa, Gochoo, Munkhjargal, Batnasan, Ganzorig, Otgonbold, Munkh-Erdene, Alnajjar, Fady, Hsieh, Jun-Wei, Kornuta, Tomasz, Li, Xiaolong, Zhao, Yilin, Zhang, Han, Radhakrishnan, Subhashree, Jain, Arihant, Kumar, Ratnesh, Murali, Vidya N., Wang, Yuxing, Pusegaonkar, Sameer Satish, Wang, Yizhou, Biswas, Sujit, Wu, Xunlei, Zheng, Zhedong, Chakraborty, Pranamesh, Chellappa, Rama

The ninth AI City Challenge continues to advance real-world applications of computer vision and AI in transportation, industrial automation, and public safety. The 2025 edition featured four tracks and saw a 17% increase in participation, with 245 teams from 15 countries registered on the evaluation server. Public release of challenge datasets led to over 30,000 downloads to date. Track 1 focused on multi-class 3D multi-camera tracking, involving people, humanoids, autonomous mobile robots, and forklifts, using detailed calibration and 3D bounding box annotations. Track 2 tackled video question answering in traffic safety, with multi-camera incident understanding enriched by 3D gaze labels. Track 3 addressed fine-grained spatial reasoning in dynamic warehouse environments, requiring AI systems to interpret RGB-D inputs and answer spatial questions that combine perception, geometry, and language. Both Track 1 and Track 3 datasets were generated in NVIDIA Omniverse. Track 4 emphasized efficient road object detection from fisheye cameras, supporting lightweight, real-time deployment on edge devices. The evaluation framework enforced submission limits and used a partially held-out test set to ensure fair benchmarking. Final rankings were revealed after the competition concluded, fostering reproducibility and mitigating overfitting. Several teams achieved top-tier results, setting new benchmarks in multiple tasks.

large language model, machine learning, natural language, (21 more...)

2508.13564

Country:

North America > United States (0.36)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.40)

Industry:

Information Technology (0.36)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)

Neural Information Processing SystemsAug-19-2025, 18:18:30 GMT

Unsupervised Causal Generative Understanding of Images

We present a novel framework for unsupervised object-centric 3D scene understanding that generalizes robustly to out-of-distribution images.

artificial intelligence, machine learning, object-oriented architecture, (15 more...)

Country:

Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
Europe > France (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(11 more...)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(3 more...)

Neural Information Processing SystemsAug-19-2025, 16:41:53 GMT

Supplementary Materials of Decoupling Features in Hierarchical Propagation for Video Object Segmentation

The optimization strategies and related hyper-parameters are also the same as AOT. The loss function is a 0.5:0.5 combination of BCE loss [ Such a process is necessary to keep enough long-term information and avoid facing out of memory when inferring long videos. The longest video in VOT 2020 contains 1,500 frames. We compare our DeAOT with more VOS methods in Table 2 and 1. VOS cases, including similar objects, occlusion, fast motion, motion blur, etc. A.4 Border Impact and Limitations The proposed DeAOT framework significantly improves VOS's performance, robustness, and robustness. As to limitations, the scenarios with multiple similar objects and severe occlusions are still very challenging for DeAOT and other VOS solutions.

artificial intelligence, object-oriented architecture, segmentation, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.55)