AITopics | Spatial Reasoning

Collaborating Authors

Spatial Reasoning

News Overviews Instructional Materials AI-Alerts Classics

Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

Liu, Zhenyang, Wang, Yikai, Wang, Kuanning, Liang, Longfei, Xue, Xiangyang, Fu, Yanwei

arXiv.org Artificial IntelligenceJul-15-2025

Visual imitation learning is effective for robots to learn versatile tasks. However, many existing methods rely on behavior cloning with supervised historical trajectories, limiting their 3D spatial and 4D spatiotemporal awareness. Consequently, these methods struggle to capture the 3D structures and 4D spatiotemporal relationships necessary for real-world deployment. In this work, we propose 4D Diffusion Policy (DP4), a novel visual imitation learning method that incorporates spatiotemporal awareness into diffusion-based policies. Unlike traditional approaches that rely on trajectory cloning, DP4 leverages a dynamic Gaussian world model to guide the learning of 3D spatial and 4D spatiotemporal perceptions from interactive environments. Our method constructs the current 3D scene from a single-view RGB-D observation and predicts the future 3D scene, optimizing trajectory generation by explicitly modeling both spatial and temporal dependencies. Extensive experiments across 17 simulation tasks with 173 variants and 3 real-world robotic tasks demonstrate that the 4D Diffusion Policy (DP4) outperforms baseline methods, improving the average simulation task success rate by 16.4% (Adroit), 14% (DexArt), and 6.45% (RLBench), and the average real-world robotic task success rate by 8.6%.

artificial intelligence, gaussian world model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.0671

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.82)

Add feedback

SDTN and TRN: Adaptive Spectral-Spatial Feature Extraction for Hyperspectral Image Classification

Ye, Fuyin, Yao, Erwen, Chen, Jianyong, He, Fengmei, Zhang, Junxiang, Ni, Lihao

arXiv.org Artificial IntelligenceJul-15-2025

Hyperspectral image classification plays a pivotal role in precision agriculture, providing accurate insights into crop health monitoring, disease detection, and soil analysis. However, traditional methods struggle with high-dimensional data, spectral-spatial redundancy, and the scarcity of labeled samples, often leading to suboptimal performance. To address these challenges, we propose the Self-Adaptive Tensor- Regularized Network (SDTN), which combines tensor decomposition with regularization mechanisms to dynamically adjust tensor ranks, ensuring optimal feature representation tailored to the complexity of the data. Building upon SDTN, we propose the Tensor-Regularized Network (TRN), which integrates the features extracted by SDTN into a lightweight network capable of capturing spectral-spatial features at multiple scales. This approach not only maintains high classification accuracy but also significantly reduces computational complexity, making the framework highly suitable for real-time deployment in resource-constrained environments. Experiments on PaviaU datasets demonstrate significant improvements in accuracy and reduced model parameters compared to state-of-the-art methods.

artificial intelligence, data mining, machine learning, (5 more...)

arXiv.org Artificial Intelligence

2507.09492

Genre: Research Report (0.69)

Industry: Food & Agriculture > Agriculture (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Sensing and Signal Processing > Image Processing (0.60)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.60)
(2 more...)

Add feedback

M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

AI, Inclusion, :, null, Wang, Fudong, Liu, Jiajia, Chen, Jingdong, Zhou, Jun, Ji, Kaixiang, Ru, Lixiang, Guo, Qingpei, Zheng, Ruobing, Li, Tianqi, Yuan, Yi, Mao, Yifan, Xiao, Yuting, Ma, Ziping

arXiv.org Artificial IntelligenceJul-14-2025

Recent advancements in Multimodal Large Language Models (MLLMs), particularly through Reinforcement Learning with Verifiable Rewards (RLVR), have significantly enhanced their reasoning abilities. However, a critical gap persists: these models struggle with dynamic spatial interactions, a capability essential for real-world applications. To bridge this gap, we introduce M2-Reasoning-7B, a model designed to excel in both general and spatial reasoning. Our approach integrates two key innovations: (1) a novel data pipeline that generates 294.2K high-quality data samples (168K for cold-start fine-tuning and 126.2K for RLVR), which feature logically coherent reasoning trajectories and have undergone comprehensive assessment; and (2) a dynamic multi-task training strategy with step-wise optimization to mitigate conflicts between data, and task-specific rewards for delivering tailored incentive signals. This combination of curated data and advanced training allows M2-Reasoning-7B to set a new state-of-the-art (SOTA) across 8 benchmarks, showcasing superior performance in both general and spatial reasoning domains.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.08306

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction

Neogi, Pinaki Prasad Guha, Mohammadshirazi, Ahmad, Ramnath, Rajiv

arXiv.org Artificial IntelligenceJul-14-2025

Traffic accidents are rare, yet high-impact events that require long-context multimodal reasoning for accurate risk forecasting. In this paper, we introduce ALCo-FM, a unified adaptive long-context foundation model that computes a volatility pre-score to dynamically select context windows for input data and encodes and fuses these multimodal data via shallow cross attention. Following a local GAT layer and a BigBird-style sparse global transformer over H3 hexagonal grids, coupled with Monte Carlo dropout for confidence, the model yields superior, well-calibrated predictions. Trained on data from 15 US cities with a class-weighted loss to counter label imbalance, and fine-tuned with minimal data on held-out cities, ALCo-FM achieves 0.94 accuracy, 0.92 F1, and an ECE of 0.04, outperforming more than 20 state-of-the-art baselines in large-scale urban risk prediction. Code and dataset are available at: https://github.com/PinakiPrasad12/ALCo-FM

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.08153

Country:

Asia (0.67)
North America > United States > California (0.46)
North America > United States > Texas (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Information Technology (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
(4 more...)

Add feedback

SURPRISE3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes

Huang, Jiaxin, Li, Ziwen, Zhang, Hanlve, Chen, Runnan, He, Xiao, Guo, Yandong, Wang, Wenping, Liu, Tongliang, Gong, Mingming

arXiv.org Artificial IntelligenceJul-11-2025

The integration of language and 3D perception is critical for embodied AI and robotic systems to perceive, understand, and interact with the physical world. Spatial reasoning, a key capability for understanding spatial relationships between objects, remains underexplored in current 3D vision-language research. Existing datasets often mix semantic cues (e.g., object name) with spatial context, leading models to rely on superficial shortcuts rather than genuinely interpreting spatial relationships. To address this gap, we introduce S\textsc{urprise}3D, a novel dataset designed to evaluate language-guided spatial reasoning segmentation in complex 3D scenes. S\textsc{urprise}3D consists of more than 200k vision language pairs across 900+ detailed indoor scenes from ScanNet++ v2, including more than 2.8k unique object classes. The dataset contains 89k+ human-annotated spatial queries deliberately crafted without object name, thereby mitigating shortcut biases in spatial understanding. These queries comprehensively cover various spatial reasoning skills, such as relative position, narrative perspective, parametric perspective, and absolute distance reasoning. Initial benchmarks demonstrate significant challenges for current state-of-the-art expert 3D visual grounding methods and 3D-LLMs, underscoring the necessity of our dataset and the accompanying 3D Spatial Reasoning Segmentation (3D-SRS) benchmark suite. S\textsc{urprise}3D and 3D-SRS aim to facilitate advancements in spatially aware AI, paving the way for effective embodied interaction and robotic planning. The code and datasets can be found in https://github.com/liziwennba/SUPRISE.

artificial intelligence, reasoning, spatial reasoning, (16 more...)

arXiv.org Artificial Intelligence

2507.07781

Country: North America > United States > Texas (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)

Add feedback

Feature-free regression kriging

Luo, Peng, Wu, Yilong, Song, Yongze

arXiv.org Machine LearningJul-11-2025

Spatial interpolation is a crucial task in geography. As perhaps the most widely used interpolation methods, geostatistical models -- such as Ordinary Kriging (OK) -- assume spatial stationarity, which makes it difficult to capture the nonstationary characteristics of geographic variables. A common solution is trend surface modeling (e.g., Regression Kriging, RK), which relies on external explanatory variables to model the trend and then applies geostatistical interpolation to the residuals. However, this approach requires high-quality and readily available explanatory variables, which are often lacking in many spatial interpolation scenarios -- such as estimating heavy metal concentrations underground. This study proposes a Feature-Free Regression Kriging (FFRK) method, which automatically extracts geospatial features -- including local dependence, local heterogeneity, and geosimilarity -- to construct a regression-based trend surface without requiring external explanatory variables. We conducted experiments on the spatial distribution prediction of three heavy metals in a mining area in Australia. In comparison with 17 classical interpolation methods, the results indicate that FFRK, which does not incorporate any explanatory variables and relies solely on extracted geospatial features, consistently outperforms both conventional Kriging techniques and machine learning models that depend on explanatory variables. This approach effectively addresses spatial nonstationarity while reducing the cost of acquiring explanatory variables, improving both prediction accuracy and generalization ability. This finding suggests that an accurate characterization of geospatial features based on domain knowledge can significantly enhance spatial prediction performance -- potentially yielding greater improvements than merely adopting more advanced statistical models.

artificial intelligence, machine learning, spatial reasoning, (20 more...)

arXiv.org Machine Learning

2507.07382

Country:

Oceania > Australia > Western Australia > Perth (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Perceptual Distortions and Autonomous Representation Learning in a Minimal Robotic System

Warutumo, David, Maina, Ciira wa

arXiv.org Artificial IntelligenceJul-11-2025

Autonomous agents, particularly in the field of robotics, rely on sensory information to perceive and navigate their environment. However, these sensory inputs are often imperfect, leading to distortions in the agent's internal representation of the world. This paper investigates the nature of these perceptual distortions and how they influence autonomous representation learning using a minimal robotic system. We utilize a simulated two - wheeled robot equipped with distance sensors and a compass, operating w ithin a simple square environment. Through analysis of the robot's sensor data during random exploration, we demonstrate how a distorted perceptual space emerges. Despite these distortions, we identify emergent structures within the perceptual space that c orrelate with the physical environment, revealing how the robot autonomously learns a structured representation for navigation without explicit spatial information. This work contributes to the understanding of embodied cognition, minimal agency, and the r ole of perception in self - generated navigation strategies in artificial life.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.07845

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Add feedback

Multilayer GNN for Predictive Maintenance and Clustering in Power Grids

Kazim, Muhammad, Pirim, Harun, Le, Chau, Le, Trung, Yadav, Om Prakash

arXiv.org Artificial IntelligenceJul-11-2025

Unplanned power outages cost the US economy over $150 billion annually, partly due to predictive maintenance (PdM) models that overlook spatial, temporal, and causal dependencies in grid failures. This study introduces a multilayer Graph Neural Network (GNN) framework to enhance PdM and enable resilience-based substation clustering. Using seven years of incident data from Oklahoma Gas & Electric (292,830 records across 347 substations), the framework integrates Graph Attention Networks (spatial), Graph Convolutional Networks (temporal), and Graph Isomorphism Networks (causal), fused through attention-weighted embeddings. Our model achieves a 30-day F1-score of 0.8935 +/- 0.0258, outperforming XGBoost and Random Forest by 3.2% and 2.7%, and single-layer GNNs by 10 to 15 percent. Removing the causal layer drops performance to 0.7354 +/- 0.0418. For resilience analysis, HDBSCAN clustering on HierarchicalRiskGNN embeddings identifies eight operational risk groups. The highest-risk cluster (Cluster 5, 44 substations) shows 388.4 incidents/year and 602.6-minute recovery time, while low-risk groups report fewer than 62 incidents/year. ANOVA (p < 0.0001) confirms significant inter-cluster separation. Our clustering outperforms K-Means and Spectral Clustering with a Silhouette Score of 0.626 and Davies-Bouldin index of 0.527. This work supports proactive grid management through improved failure prediction and risk-aware substation clustering.

artificial intelligence, machine learning, substation, (17 more...)

arXiv.org Artificial Intelligence

2507.07298

Country: North America > United States > Oklahoma (0.25)

Genre: Research Report > New Finding (0.88)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.89)

Add feedback

A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding

Liu, Zhenyang, Zheng, Sixiao, Chen, Siyu, Zhao, Cairong, Liang, Longfei, Xue, Xiangyang, Fu, Yanwei

arXiv.org Artificial IntelligenceJul-10-2025

Open-vocabulary 3D visual grounding aims to localize target objects based on free-form language queries, which is crucial for embodied AI applications such as autonomous navigation, robotics, and augmented reality. Learning 3D language fields through neural representations enables accurate understanding of 3D scenes from limited viewpoints and facilitates the localization of target objects in complex environments. However, existing language field methods struggle to accurately localize instances using spatial relations in language queries, such as ``the book on the chair.'' This limitation mainly arises from inadequate reasoning about spatial relations in both language queries and 3D scenes. In this work, we propose SpatialReasoner, a novel neural representation-based framework with large language model (LLM)-driven spatial reasoning that constructs a visual properties-enhanced hierarchical feature field for open-vocabulary 3D visual grounding. To enable spatial reasoning in language queries, SpatialReasoner fine-tunes an LLM to capture spatial relations and explicitly infer instructions for the target, anchor, and spatial relation. To enable spatial reasoning in 3D scenes, SpatialReasoner incorporates visual properties (opacity and color) to construct a hierarchical feature field. This field represents language and instance features using distilled CLIP features and masks extracted via the Segment Anything Model (SAM). The field is then queried using the inferred instructions in a hierarchical manner to localize the target 3D instance based on the spatial relation in the language query. Extensive experiments show that our framework can be seamlessly integrated into different neural representations, outperforming baseline models in 3D visual grounding while empowering their spatial reasoning capability.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2507.06719

Country: Asia > China (0.48)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

OLG++: A Semantic Extension of Obligation Logic Graph

Dasgupta, Subhasis, Stephens, Jon, Gupta, Amarnath

arXiv.org Artificial IntelligenceJul-9-2025

We present OLG++, a semantic extension of the Obligation Logic Graph (OLG) for modeling regulatory and legal rules in municipal and interjurisdictional contexts. OLG++ introduces richer node and edge types, including spatial, temporal, party group, defeasibility, and logical grouping constructs, enabling nuanced representations of legal obligations, exceptions, and hierarchies. The model supports structured reasoning over rules with contextual conditions, precedence, and complex triggers. We demonstrate its expressiveness through examples from food business regulations, showing how OLG++ supports legal question answering using property graph queries. OLG++ also improves over LegalRuleML by providing native support for subClassOf, spatial constraints, and reified exception structures. Our examples show that OLG++ is more expressive than prior graph-based models for legal knowledge representation.

artificial intelligence, natural language, obligation, (15 more...)

arXiv.org Artificial Intelligence

2507.05488

Country: North America > United States > California > San Diego County (0.17)

Genre: Research Report (0.50)

Industry: Law > Statutes (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.47)

Add feedback