AITopics | Spatial Reasoning

Collaborating Authors

Spatial Reasoning

News Overviews Instructional Materials AI-Alerts Classics

Spatial-ViLT: Enhancing Visual Spatial Reasoning through Multi-Task Learning

Islam, Chashi Mahiul, Mamo, Oteo, Chacko, Samuel Jacob, Liu, Xiuwen, Yu, Weikuan

arXiv.org Artificial IntelligenceOct-7-2025

Vision-language models (VLMs) have advanced multimodal reasoning but still face challenges in spatial reasoning for 3D scenes and complex object configurations. To address this, we introduce SpatialViLT, an enhanced VLM that integrates spatial features like depth maps, 3D coordinates, and edge maps through a multi-task learning framework. This approach enriches multimodal embeddings with spatial understanding. We propose two variants: SpatialViLT and MaskedSpatialViLT, focusing on full and masked object regions, respectively. Additionally, SpatialEnsemble combines both approaches, achieving state-of-the-art accuracy. Our models excel in spatial reasoning categories such as directional, topological, and proximity relations, as demonstrated on the challenging Visual Spatial Reasoning (VSR) dataset. This work represents a significant step in enhancing the spatial intelligence of AI systems, crucial for advanced multimodal understanding and real-world applications.

artificial intelligence, machine learning, spatial reasoning, (14 more...)

arXiv.org Artificial Intelligence

2510.03441

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Decoupling Geometry from Optimization in 2D Irregular Cutting and Packing Problems: an Open-Source Collision Detection Engine

Gardeyn, Jeroen, Berghe, Greet Vanden, Wauters, Tony

arXiv.org Artificial IntelligenceOct-6-2025

Addressing irregular cutting and packing (C&P) optimization problems poses two distinct challenges: the geometric challenge of determining whether or not an item can be placed feasibly at a certain position, and the optimization challenge of finding a good solution according to some objective function. Until now, those tackling such problems have had to address both challenges simultaneously, requiring two distinct sets of expertise and a lot of research & development effort. One way to lower this barrier is to decouple the two challenges. In this paper we introduce a powerful collision detection engine (CDE) for 2D irregular C&P problems which assumes full responsibility for the geometric challenge. The CDE (i) allows users to focus with full confidence on their optimization challenge by abstracting geometry away and (ii) enables independent advances to propagate to all optimization algorithms built atop it. We present a set of core principles and design philosophies to model a general and adaptable CDE focused on maximizing performance, accuracy and robustness. These principles are accompanied by a concrete open-source implementation called jagua-rs. This paper together with its implementation serves as a catalyst for future advances in irregular C&P problems by providing a solid foundation which can either be used as it currently exists or be further improved upon. Funding: This research was supported by the Research Foundation -- Flanders (FWO) under grant number 1S71222N and K804824N.

artificial intelligence, optimization problem, spatial reasoning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1287/ijoc.2024.1025

2508.08341

Country: Europe > Belgium > Flanders (0.34)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.67)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.55)
(2 more...)

Add feedback

An Architecture for Spatial Networking

Millar, Josh, Gibb, Ryan, Ang, Roy, Haddadi, Hamed, Madhavapeddy, Anil

arXiv.org Artificial IntelligenceOct-6-2025

Physical spaces are increasingly dense with networked devices, promising seamless coordination and ambient intelligence. Yet today, cloud-first architectures force all communication through wide-area networks regardless of physical proximity. We lack an abstraction for spatial networking: using physical spaces to create boundaries for private, robust, and low-latency communication. We introduce $\textit{Bifröst}$, a programming model that realizes spatial networking using bigraphs to express both containment and connectivity, enabling policies to be scoped by physical boundaries, devices to be named by location, the instantiation of spatial services, and the composition of spaces while maintaining local autonomy. Bifröst enables a new class of spatially-aware applications, where co-located devices communicate directly, physical barriers require explicit gateways, and local control bridges to global coordination.

artificial intelligence, boundary, cloud computing, (19 more...)

arXiv.org Artificial Intelligence

2507.22687

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Cloud Computing (1.00)
(2 more...)

Add feedback

Multivariate Sparse Coding of Nonstationary Covariances with Gaussian Processes

Rui Li

Neural Information Processing SystemsOct-3-2025, 00:46:10 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, correlation, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.46)

Add feedback

Scalable Asynchronous Federated Modeling for Spatial Data

Shi, Jianwei, Abdulah, Sameh, Sun, Ying, Genton, Marc G.

arXiv.org Machine LearningOct-3-2025

Spatial data are central to applications such as environmental monitoring and urban planning, but are often distributed across devices where privacy and communication constraints limit direct sharing. Federated modeling offers a practical solution that preserves data privacy while enabling global modeling across distributed data sources. For instance, environmental sensor networks are privacy-and bandwidth-constrained, motivating federated spatial modeling that shares only privacy-preserving summaries to produce timely, high-resolution pollution maps without centralizing raw data. However, existing federated modeling approaches either ignore spatial dependence or rely on synchronous updates that suffer from stragglers in heterogeneous environments. This work proposes an asynchronous federated modeling framework for spatial data based on low-rank Gaussian process approximations. The method employs block-wise optimization and introduces strategies for gradient correction, adaptive aggregation, and stabilized updates. We establish linear convergence with explicit dependence on staleness, a result of standalone theoretical significance. Moreover, numerical experiments demonstrate that the asynchronous algorithm achieves synchronous performance under balanced resource allocation and significantly outperforms it in heterogeneous settings, showcasing superior robustness and scalability. Keywords: Asynchronous federated learning, distributed spatial modeling, Gaussian processes, low-rank approximation, block-wise optimization.

algorithm, asynchronous algorithm, scalable asynchronous federated modeling, (10 more...)

arXiv.org Machine Learning

2510.01771

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

GeoSQL-Eval: First Evaluation of LLMs on PostGIS-Based NL2GeoSQL Queries

Hou, Shuyang, Jiao, Haoyue, Liu, Ziqi, Xie, Lutong, Chen, Guanyu, Wu, Shaowen, Guan, Xuefeng, Wu, Huayi

arXiv.org Artificial IntelligenceOct-3-2025

Large language models (LLMs) have shown strong performance in natural language to SQL (NL2SQL) tasks within general databases. However, extending to GeoSQL introduces additional complexity from spatial data types, function invocation, and coordinate systems, which greatly increases generation and execution difficulty. Existing benchmarks mainly target general SQL, and a systematic evaluation framework for GeoSQL is still lacking. To fill this gap, we present GeoSQL-Eval, the first end-to-end automated evaluation framework for PostGIS query generation, together with GeoSQL-Bench, a benchmark for assessing LLM performance in NL2GeoSQL tasks. GeoSQL-Bench defines three task categories-conceptual understanding, syntax-level SQL generation, and schema retrieval-comprising 14,178 instances, 340 PostGIS functions, and 82 thematic databases. GeoSQL-Eval is grounded in Webb's Depth of Knowledge (DOK) model, covering four cognitive dimensions, five capability levels, and twenty task types to establish a comprehensive process from knowledge acquisition and syntax generation to semantic alignment, execution accuracy, and robustness. We evaluate 24 representative models across six categories and apply the entropy weight method with statistical analyses to uncover performance differences, common error patterns, and resource usage. Finally, we release a public GeoSQL-Eval leaderboard platform for continuous testing and global comparison. This work extends the NL2GeoSQL paradigm and provides a standardized, interpretable, and extensible framework for evaluating LLMs in spatial database contexts, offering valuable references for geospatial information science and related applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.25264

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Unsupervised Emergence of Egocentric Spatial Structure from Sensorimotor Prediction

Alban Laflaquière, Michael Garcia Ortiz

Neural Information Processing SystemsOct-2-2025, 01:41:04 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, spatial reasoning, (15 more...)

Neural Information Processing Systems

Country: Europe > France (0.14)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.69)

Add feedback

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification

Li, Wei, Zhang, Renshan, Shao, Rui, He, Jie, Nie, Liqiang

arXiv.org Artificial IntelligenceOct-2-2025

Recent Vision-Language-Action (VLA) models built on pre-trained Vision-Language Models (VLMs) require extensive post-training, resulting in high computational overhead that limits scalability and deployment.We propose CogVLA, a Cognition-Aligned Vision-Language-Action framework that leverages instruction-driven routing and sparsification to improve both efficiency and performance. CogVLA draws inspiration from human multimodal coordination and introduces a 3-stage progressive architecture. 1) Encoder-FiLM based Aggregation Routing (EFA-Routing) injects instruction information into the vision encoder to selectively aggregate and compress dual-stream visual tokens, forming a instruction-aware latent representation. 2) Building upon this compact visual encoding, LLM-FiLM based Pruning Routing (LFP-Routing) introduces action intent into the language model by pruning instruction-irrelevant visually grounded tokens, thereby achieving token-level sparsity. 3) To ensure that compressed perception inputs can still support accurate and coherent action generation, we introduce V-L-A Coupled Attention (CAtten), which combines causal vision-language attention with bidirectional action parallel decoding. Extensive experiments on the LIBERO benchmark and real-world robotic tasks demonstrate that CogVLA achieves state-of-the-art performance with success rates of 97.4% and 70.0%, respectively, while reducing training costs by 2.5-fold and decreasing inference latency by 2.8-fold compared to OpenVLA. CogVLA is open-sourced and publicly available at https://github.com/JiuTian-VL/CogVLA.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.21046

Country: Asia > China (0.28)

Genre:

Research Report (1.00)
Workflow (0.94)

Industry:

Leisure & Entertainment (0.67)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

UrbanGraph: Physics-Informed Spatio-Temporal Dynamic Heterogeneous Graphs for Urban Microclimate Prediction

Xin, Weilin, Huang, Chenyu, Li, Peilin, Zhong, Jing, Yao, Jiawei

arXiv.org Artificial IntelligenceOct-2-2025

With rapid urbanization, predicting urban microclimates has become critical, as it affects building energy demand and public health risks. However, existing generative and homogeneous graph approaches fall short in capturing physical consistency, spatial dependencies, and temporal variability. To address this, we introduce UrbanGraph, a physics-informed framework integrating heterogeneous and dynamic spatio-temporal graphs. It encodes key physical processes -- vegetation evapotranspiration, shading, and convective diffusion -- while modeling complex spatial dependencies among diverse urban entities and their temporal evolution. We evaluate UrbanGraph on UMC4/12, a physics-based simulation dataset covering diverse urban configurations and climates. Results show that UrbanGraph improves $R^2$ by up to 10.8% and reduces FLOPs by 17.0% over all baselines, with heterogeneous and dynamic graphs contributing 3.5% and 7.1% gains. Our dataset provides the first high-resolution benchmark for spatio-temporal microclimate modeling, and our method extends to broader urban heterogeneous dynamic computing tasks.

data mining, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2510.00457

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Renewable (0.46)
Transportation > Infrastructure & Services (0.46)
Transportation > Ground > Road (0.46)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Geo-R1: Unlocking VLM Geospatial Reasoning with Cross-View Reinforcement Learning

Xu, Chenhui, Yu, Fuxun, Bianco, Michael J., Kovarskiy, Jacob, Tang, Raphael, Zhang, Qi, Xu, Zirui, LeVine, Will, Dubbs, Brandon, Liao, Heming, Burgess, Cassandra, Bag, Suvam, Patravali, Jay, Kukal, Rupanjali, Figueroa, Mikael, Madhok, Rishi, Karianakis, Nikolaos, Xiong, Jinjun

arXiv.org Artificial IntelligenceOct-2-2025

We introduce Geo-R1, a reasoning-centric post-training framework that unlocks geospatial reasoning in vision-language models by combining thinking scaffolding and elevating. In the scaffolding stage, Geo-R1 instills a "geospatial thinking paradigm" via supervised fine-tuning on synthetic chain-of-thought exemplars, enabling models to connect visual cues with geographic priors without costly human reasoning annotations. In the elevating stage, it uses GRPO-based reinforcement learning on a weakly-supervised cross-view pairing proxy. This design supplies a verifiable and scalable reward signal: teaching models to capture and reconcile features across modalities, and harnessing reasoning for accurate prediction. Geo-R1 extends geospatial modeling from domain pretraining / supervised finetuning to reasoning-first post-training, and achieves state-of-the-art performance across various geospatial reasoning benchmarks. Our model is available at https://huggingface.co/miniHui/Geo-R1. Figure 1: Geo-R1 significantly outperforms baseline Bai et al. (2025) across 13 verifiable geo-reasoning tasks on the GeoChain benchmark (Y er-ramilli et al., 2025) in the zero-shot setting. See Table 6 for detailed description of these tasks. Geospatial reasoning is fundamental to a wide range of scientific and societal applications, spanning disaster response, search and rescue, urban planning, environmental monitoring, and sociocultural study. Unlike common vision-language reasoning (Li et al., 2024) centering around object recognition, captioning and general question-answering, geospatial reasoning spans many modalities (e.g., aerial imagery, streetview photos, location metadata, place information, etc.), and varied tasks (e.g., geographical, environmental, sociocultural, etc.) as shown in Figure 1. This blend of multimodal evidence and knowledge-intensive tasking makes general reasoning both crucial for geospatial understanding, and also uniquely challenging. While effective in natural domains, SFT is poorly suited in geospatial settings. Geospatial raw data can be plentiful, but supervisions are sparse, usually limited to coordinate metadata without descriptive content.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.00072

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback