AITopics | Spatial Reasoning

Collaborating Authors

Spatial Reasoning

News Overviews Instructional Materials AI-Alerts Classics

Positional Encoder Graph Quantile Neural Networks for Geographic Data

de Amorim, William E. R., Sisson, Scott A., Rodrigues, T., Nott, David J., Rodrigues, Guilherme S.

arXiv.org Machine LearningSep-27-2024

Positional Encoder Graph Neural Networks (PE-GNNs) are a leading approach for modeling continuous spatial data. However, they often fail to produce calibrated predictive distributions, limiting their effectiveness for uncertainty quantification. We introduce the Positional Encoder Graph Quantile Neural Network (PE-GQNN), a novel method that integrates PE-GNNs, Quantile Neural Networks, and recalibration techniques in a fully nonparametric framework, requiring minimal assumptions about the predictive distributions. We propose a new network architecture that, when combined with a quantile-based loss function, yields accurate and reliable probabilistic models without increasing computational complexity. Our approach provides a flexible, robust framework for conditional density estimation, applicable beyond spatial data contexts. We further introduce a structured method for incorporating a KNN predictor into the model while avoiding data leakage through the GNN layer operation. Experiments on benchmark datasets demonstrate that PE-GQNN significantly outperforms existing state-of-the-art methods in both predictive accuracy and uncertainty quantification.

dataset, prediction, quantile, (15 more...)

arXiv.org Machine Learning

2409.18865

Country:

North America > United States > California (0.05)
South America > Brazil > Federal District > Brasília (0.04)
Oceania > Australia > New South Wales (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

AP-VLM: Active Perception Enabled by Vision-Language Models

Sripada, Venkatesh, Carter, Samuel, Guerin, Frank, Ghalamzan, Amir

arXiv.org Artificial IntelligenceSep-26-2024

Abstract-- Active perception enables robots to dynamically gather information by adjusting their viewpoints, a crucial capability for interacting with complex, partially observable environments. In this paper, we present AP-VLM, a novel framework that combines active perception with a Vision-Language Model (VLM) to guide robotic exploration and answer semantic queries. Using a 3D virtual grid overlaid on the scene and orientation adjustments, AP-VLM allows a robotic manipulator to intelligently select optimal viewpoints and orientations to resolve challenging tasks, such as identifying objects in occluded or inclined positions. We evaluate our system on two robotic platforms: a 7-DOF Franka Panda and a 6-DOF UR5, across various scenes with differing object configurations. Our results demonstrate that AP-VLM significantly outperforms passive perception methods and baseline models, including Toward Grounded Common Sense Reasoning (TGCSR), particularly in scenarios where fixed camera views are inadequate. The adaptability of AP-VLM in real-world settings shows promise for enhancing robotic systems' understanding of complex environments, bridging the gap between high-level semantic reasoning and low-level control.

active perception, perception, robot, (16 more...)

arXiv.org Artificial Intelligence

2409.17641

Country: Europe > United Kingdom > England > Surrey (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.46)

Add feedback

Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?

Zhao, Bowen, Dirac, Leo Parker, Varshavskaya, Paulina

arXiv.org Artificial IntelligenceSep-25-2024

Large vision-language models (VLMs) have become state-of-the-art for many computer vision tasks, with in-context learning (ICL) as a popular adaptation strategy for new ones. But can VLMs learn novel concepts purely from visual demonstrations, or are they limited to adapting to the output format of ICL examples? We propose a new benchmark we call Spatial Visual Ambiguity Tasks (SVAT) that challenges state-of-the-art VLMs to learn new visuospatial tasks in-context. We find that VLMs fail to do this zero-shot, and sometimes continue to fail after finetuning. However, adding simpler data to the training by curriculum learning leads to improved ICL performance.

fiducial, svat, vlm, (12 more...)

arXiv.org Artificial Intelligence

2409.1708

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.53)

Add feedback

ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis

Zhou, Fangshuo, Li, Huaxia, Hu, Rui, Wu, Sensen, Feng, Hailin, Du, Zhenhong, Xu, Liuchang

arXiv.org Artificial IntelligenceSep-25-2024

Volunteer Geographic Information (VGI), with its rich variety, large volume, rapid updates, and diverse sources, has become a critical source of geospatial data. However, VGI data from platforms like OSM exhibit significant quality heterogeneity across different data types, particularly with urban building data. To address this, we propose a multi-source geographic data transformation solution, utilizing accessible and complete VGI data to assist in generating urban building footprint data. We also employ a multimodal data generation framework to improve accuracy. First, we introduce a pipeline for constructing an 'image-text-metadata-building footprint' dataset, primarily based on road network data and supplemented by other multimodal data. We then present ControlCity, a geographic data transformation method based on a multimodal diffusion model. This method first uses a pre-trained text-to-image model to align text, metadata, and building footprint data. An improved ControlNet further integrates road network and land-use imagery, producing refined building footprint data. Experiments across 22 global cities demonstrate that ControlCity successfully simulates real urban building patterns, achieving state-of-the-art performance. Specifically, our method achieves an average FID score of 50.94, reducing error by 71.01% compared to leading methods, and a MIoU score of 0.36, an improvement of 38.46%. Additionally, our model excels in tasks like urban morphology transfer, zero-shot city generation, and spatial data completeness assessment. In the zero-shot city task, our method accurately predicts and generates similar urban structures, demonstrating strong generalization. This study confirms the effectiveness of our approach in generating urban building footprint data and capturing complex city characteristics.

building footprint, controlcity, morphology, (16 more...)

arXiv.org Artificial Intelligence

2409.17049

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Beijing > Beijing (0.05)
Asia > Indonesia > Java > Jakarta > Jakarta (0.05)
(18 more...)

Genre: Research Report > New Finding (0.67)

Industry: Transportation > Ground > Road (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Spatial-Temporal Mixture-of-Graph-Experts for Multi-Type Crime Prediction

Wu, Ziyang, Liu, Fan, Han, Jindong, Liang, Yuxuan, Liu, Hao

arXiv.org Artificial IntelligenceSep-24-2024

As various types of crime continue to threaten public safety and economic development, predicting the occurrence of multiple types of crimes becomes increasingly vital for effective prevention measures. Although extensive efforts have been made, most of them overlook the heterogeneity of different crime categories and fail to address the issue of imbalanced spatial distribution. In this work, we propose a Spatial-Temporal Mixture-of-Graph-Experts (ST-MoGE) framework for collective multiple-type crime prediction. To enhance the model's ability to identify diverse spatial-temporal dependencies and mitigate potential conflicts caused by spatial-temporal heterogeneity of different crime categories, we introduce an attentive-gated Mixture-of-Graph-Experts (MGEs) module to capture the distinctive and shared crime patterns of each crime category. Then, we propose Cross-Expert Contrastive Learning(CECL) to update the MGEs and force each expert to focus on specific pattern modeling, thereby reducing blending and redundancy. Furthermore, to address the issue of imbalanced spatial distribution, we propose a Hierarchical Adaptive Loss Re-weighting (HALR) approach to eliminate biases and insufficient learning of data-scarce regions. To evaluate the effectiveness of our methods, we conduct comprehensive experiments on two real-world crime datasets and compare our results with twelve advanced baselines. The experimental results demonstrate the superiority of our methods.

category, prediction, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2409.15764

Country:

North America > United States > New York (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > China > Hong Kong (0.05)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sparse-to-Dense LiDAR Point Generation by LiDAR-Camera Fusion for 3D Object Detection

Lee, Minseung, Moon, Seokha, Lee, Seung Joon, Kim, Jinkyu

arXiv.org Artificial IntelligenceSep-24-2024

Accurately detecting objects at long distances remains a critical challenge in 3D object detection when relying solely on LiDAR sensors due to the inherent limitations of data sparsity. To address this issue, we propose the LiDAR-Camera Augmentation Network (LCANet), a novel framework that reconstructs LiDAR point cloud data by fusing 2D image features, which contain rich semantic information, generating additional points to improve detection accuracy. LCANet fuses data from LiDAR sensors and cameras by projecting image features into the 3D space, integrating semantic information into the point cloud data. This fused data is then encoded to produce 3D features that contain both semantic and spatial information, which are further refined to reconstruct final points before bounding box prediction. This fusion effectively compensates for LiDAR's weakness in detecting objects at long distances, which are often represented by sparse points. Additionally, due to the sparsity of many objects in the original dataset, which makes effective supervision for point generation challenging, we employ a point cloud completion network to create a complete point cloud dataset that supervises the generation of dense point clouds in our network. Extensive experiments on the KITTI and Waymo datasets demonstrate that LCANet significantly outperforms existing models, particularly in detecting sparse and distant objects.

dataset, detection, point cloud, (15 more...)

arXiv.org Artificial Intelligence

2409.14985

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)

Add feedback

Double-Path Adaptive-correlation Spatial-Temporal Inverted Transformer for Stock Time Series Forecasting

Yan, Wenbo, Tan, Ying

arXiv.org Artificial IntelligenceSep-23-2024

Spatial-temporal graph neural networks (STGNNs) have achieved significant success in various time series forecasting tasks. However, due to the lack of explicit and fixed spatial relationships in stock prediction tasks, many STGNNs fail to perform effectively in this domain. While some STGNNs learn spatial relationships from time series, they often lack comprehensiveness. Research indicates that modeling time series using feature changes as tokens reveals entirely different information compared to using time steps as tokens. To more comprehensively extract dynamic spatial information from stock data, we propose a Double-Path Adaptive-correlation Spatial-Temporal Inverted Transformer (DPA-STIFormer). DPA-STIFormer models each node via continuous changes in features as tokens and introduces a Double Direction Self-adaptation Fusion mechanism. This mechanism decomposes node encoding into temporal and feature representations, simultaneously extracting different spatial correlations from a double path approach, and proposes a Double-path gating mechanism to fuse these two types of correlation information. Experiments conducted on four stock market datasets demonstrate state-of-the-art results, validating the model's superior capability in uncovering latent temporal-correlation patterns.

correlation, information, time step, (12 more...)

arXiv.org Artificial Intelligence

2409.15662

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HydroVision: LiDAR-Guided Hydrometric Prediction with Vision Transformers and Hybrid Graph Learning

Roudbari, Naghmeh Shafiee, Eicker, Ursula, Poullis, Charalambos, Patterson, Zachary

arXiv.org Artificial IntelligenceSep-23-2024

Hydrometric forecasting is crucial for managing water resources, flood prediction, and environmental protection. Water stations are interconnected, and this connectivity influences the measurements at other stations. However, the dynamic and implicit nature of water flow paths makes it challenging to extract a priori knowledge of the connectivity structure. We hypothesize that terrain elevation significantly affects flow and connectivity. To incorporate this, we use LiDAR terrain elevation data encoded through a Vision Transformer (ViT). The ViT, which has demonstrated excellent performance in image classification by directly applying transformers to sequences of image patches, efficiently captures spatial features of terrain elevation. To account for both spatial and temporal features, we employ GRU blocks enhanced with graph convolution, a method widely used in the literature. We propose a hybrid graph learning structure that combines static and dynamic graph learning. A static graph, derived from transformer-encoded LiDAR data, captures terrain elevation relationships, while a dynamic graph adapts to temporal changes, improving the overall graph representation. We apply graph convolution in two layers through these static and dynamic graphs. Our method makes daily predictions up to 12 days ahead. Empirical results from multiple water stations in Quebec demonstrate that our method significantly reduces prediction error by an average of 10\% across all days, with greater improvements for longer forecasting horizons.

forecasting, prediction, vision transformer, (14 more...)

arXiv.org Artificial Intelligence

2409.15213

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.06)
North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Kriformer: A Novel Spatiotemporal Kriging Approach Based on Graph Transformers

Pan, Renbin, Xiao, Feng, Zhang, Hegui, Shen, Minyu

arXiv.org Machine LearningSep-23-2024

Accurately estimating data in sensor-less areas is crucial for understanding system dynamics, such as traffic state estimation and environmental monitoring. This study addresses challenges posed by sparse sensor deployment and unreliable data by framing the problem as a spatiotemporal kriging task and proposing a novel graph transformer model, Kriformer. This model estimates data at locations without sensors by mining spatial and temporal correlations, even with limited resources. Kriformer utilizes transformer architecture to enhance the model's perceptual range and solve edge information aggregation challenges, capturing spatiotemporal information effectively. A carefully constructed positional encoding module embeds the spatiotemporal features of nodes, while a sophisticated spatiotemporal attention mechanism enhances estimation accuracy. The multi-head spatial interaction attention module captures subtle spatial relationships between observed and unobserved locations. During training, a random masking strategy prompts the model to learn with partial information loss, allowing the spatiotemporal embedding and multi-head attention mechanisms to synergistically capture correlations among locations. Experimental results show that Kriformer excels in representation learning for unobserved locations, validated on two real-world traffic speed datasets, demonstrating its effectiveness in spatiotemporal kriging tasks.

correlation, information, node, (14 more...)

arXiv.org Machine Learning

2409.14906

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
South America > Chile (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Transportation (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

PyGRF: An improved Python Geographical Random Forest model and case studies in public health and natural disasters

Sun, Kai, Zhou, Ryan Zhenqi, Kim, Jiyeon, Hu, Yingjie

arXiv.org Artificial IntelligenceSep-20-2024

Geographical random forest (GRF) is a recently developed and spatially explicit machine learning model. With the ability to provide more accurate predictions and local interpretations, GRF has already been used in many studies. The current GRF model, however, has limitations in its determination of the local model weight and bandwidth hyperparameters, potentially insufficient numbers of local training samples, and sometimes high local prediction errors. Also, implemented as an R package, GRF currently does not have a Python version which limits its adoption among machine learning practitioners who prefer Python. This work addresses these limitations by introducing theory-informed hyperparameter determination, local training sample expansion, and spatially-weighted local prediction. We also develop a Python-based GRF model and package, PyGRF, to facilitate the use of the model. We evaluate the performance of PyGRF on an example dataset and further demonstrate its use in two case studies in public health and natural disasters.

hyperparameter, prediction, rf model, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1111/tgis.13248

2409.13947

Country:

North America > United States > New York > Erie County > Buffalo (0.14)
North America > United States > New York > Bronx County > New York City (0.04)
Europe > Greece (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Public Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback