Goto

Collaborating Authors

 spatial interaction



Generating Origin-Destination Matrices in Neural Spatial Interaction Models

Neural Information Processing Systems

Agent-based models (ABMs) are proliferating as decision-making tools across policy areas in transportation, economics, and epidemiology. In these models, a central object of interest is the discrete origin-destination matrix which captures spatial interactions and agent trip counts between locations. Existing approaches resort to continuous approximations of this matrix and subsequent ad-hoc discretisations in order to perform ABM simulation and calibration. This impedes conditioning on partially observed summary statistics, fails to explore the multimodal matrix distribution over a discrete combinatorial support, and incurs discretisation errors. To address these challenges, we introduce a computationally efficient framework that scales linearly with the number of origin-destination pairs, operates directly on the discrete combinatorial space, and learns the agents' trip intensity through a neural differential equation that embeds spatial interactions. Our approach outperforms the prior art in terms of reconstruction error and ground truth matrix coverage, at a fraction of the computational cost. We demonstrate these benefits in two large-scale spatial mobility ABMs in Washington, DC and Cambridge, UK.


A Gravity-informed Spatiotemporal Transformer for Human Activity Intensity Prediction

Wang, Yi, Wang, Zhenghong, Zhang, Fan, Kang, Chaogui, Ruan, Sijie, Zhu, Di, Tang, Chengling, Ma, Zhongfu, Zhang, Weiyu, Zheng, Yu, Yu, Philip S., Liu, Yu

arXiv.org Artificial Intelligence

-- Human activity intensity prediction is crucial to many location - based services. Despite tremendous p rogress in modeling d ynamics of human activity, most existing methods overlook physical constraints of spatial interaction, leading to uninterpretable spatial correlations and over - smoothing phenomenon . To address these limitations, this work proposes a physics - informed deep learning framework, namely Gravity - informed Spatiotemporal Transformer (Gravityformer) by integrat ing the universal law of gravitation to refin e transformer attention. Specifically, it (1) estimates two spatially explicit mass parameters based on spatiotemporal embedding feature, (2) models the spatial interaction in end - to - end neural network using proposed adaptive gravity model to learn the physic al constrain t, and (3) utilizes the learned spatial interaction to guide and mitigate the over - smoothing phenomenon in transformer attention. Moreover, a parallel spatiotemporal graph convolution transformer is proposed for achieving a balance between coupled spatial and temporal learning. Systematic experiments on six real - world large - scale activity datasets demonstrate the quantitative and qualitative superiority of our model over state - of - the - art benchmarks. Additionally, the learned gravity attention matrix can be not only disentangled and interpreted based on geographical laws, but also improved the generalization in zero - shot cross - region inference . This work provides a novel insight into integrating physical laws with deep learning for spatiotemporal prediction . Index Terms -- Human activity intensity prediction; Gravity model; Spatial interaction; Physics - informed machine learning; Over - smoothing phenomenon; Spatiotemporal graph neural network . This work is supported by the National Natural Science Foundation of China ( Grant # 42430106, 42371468, 424B2013) . Y i Wang, Zhenghong Wang, Fan Zhang, Chengling Tang, Weiyu Zhang and Yu Liu are with Institute of Remote Sensing and Geographic Information System, School of Earth and Space Sciences, Peking University, Beijing 100871, China. Chaogui Kang is with National Engineering Research Center of Geographic Information System, China University of Geosciences (Wuhan) 430074, China. Sijie Ruan is with School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China . Di Zhu and Zhongfu Ma are with Department of Geography, Environment and Society, University of Minnesota, Twin Cities, Minneapolis, MN 55455, USA . Y u Zheng is with JD iCity, JD Technology, Beijing 100176, China . P hilip S. Yu is with Department of Computer Science, University of Illinois Chicago, Chicago 60607, USA .


OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models

Jia, Mengdi, Qi, Zekun, Zhang, Shaochen, Zhang, Wenyao, Yu, Xinqiang, He, Jiawei, Wang, He, Yi, Li

arXiv.org Artificial Intelligence

Spatial reasoning is a key aspect of cognitive psychology and remains a bottleneck for current vision-language models (VLMs). While extensive research has aimed to evaluate or improve VLMs' understanding of basic spatial relations, such as distinguishing left from right, near from far, and object counting, these tasks cover only the most elementary layer of spatial reasoning and are largely approaching saturation in the latest reasoning models. In this work, we introduce OmniSpatial, a comprehensive and challenging benchmark for spatial reasoning, grounded in cognitive psychology. OmniSpatial covers four major categories: dynamic reasoning, complex spatial logic, spatial interaction, and perspective-taking, with 50 fine-grained subcategories. Through careful manual annotation, we construct over 8.4K question-answer pairs. Extensive experiments show that both open- and closed-source VLMs exhibit significant limitations in comprehensive spatial reasoning. We also explore two strategies-PointGraph (explicit scene graph cues) and SpatialCoT (novel-view chain-of-thought)-to bolster spatial reasoning.


ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting

Yan, Xiaoyang, Pei, Muleilan, Shen, Shaojie

arXiv.org Artificial Intelligence

3D occupancy prediction is critical for comprehensive scene understanding in vision-centric autonomous driving. Recent advances have explored utilizing 3D semantic Gaussians to model occupancy while reducing computational overhead, but they remain constrained by insufficient multi-view spatial interaction and limited multi-frame temporal consistency. To overcome these issues, in this paper, we propose a novel Spatial-Temporal Gaussian Splatting (ST-GS) framework to enhance both spatial and temporal modeling in existing Gaussian-based pipelines. Specifically, we develop a guidance-informed spatial aggregation strategy within a dual-mode attention mechanism to strengthen spatial interaction in Gaussian representations. Furthermore, we introduce a geometry-aware temporal fusion scheme that effectively leverages historical context to improve temporal continuity in scene completion. Extensive experiments on the large-scale nuScenes occupancy prediction benchmark showcase that our proposed approach not only achieves state-of-the-art performance but also delivers markedly better temporal consistency compared to existing Gaussian-based methods.


MSRFormer: Road Network Representation Learning using Multi-scale Feature Fusion of Heterogeneous Spatial Interactions

Yang, Jian, Wu, Jiahui, Fang, Li, Fan, Hongchao, Zhang, Bianying, Zhao, Huijie, Yang, Guangyi, Xin, Rui, You, Xiong

arXiv.org Artificial Intelligence

Transforming road network data into vector representations using deep learning has proven effective for road network analysis. However, urban road networks' heterogeneous and hierarchical nature poses challenges for accurate representation learning. Graph neural networks, which aggregate features from neighboring nodes, often struggle due to their homogeneity assumption and focus on a single structural scale. To address these issues, this paper presents MSRFormer, a novel road network representation learning framework that integrates multi-scale spatial interactions by addressing their flow heterogeneity and long-distance dependencies. It uses spatial flow convolution to extract small-scale features from large trajectory datasets, and identifies scale-dependent spatial interaction regions to capture the spatial structure of road networks and flow heterogeneity. By employing a graph transformer, MSRFormer effectively captures complex spatial dependencies across multiple scales. The spatial interaction features are fused using residual connections, which are fed to a contrastive learning algorithm to derive the final road network representation. Validation on two real-world datasets demonstrates that MSRFormer outperforms baseline methods in two road network analysis tasks. The performance gains of MSRFormer suggest the traffic-related task benefits more from incorporating trajectory data, also resulting in greater improvements in complex road network structures with up to 16% improvements compared to the most competitive baseline method. This research provides a practical framework for developing task-agnostic road network representation models and highlights distinct association patterns of the interplay between scale effects and flow heterogeneity of spatial interactions.



Feasible Action Space Reduction for Quantifying Causal Responsibility in Continuous Spatial Interactions

George, Ashwin, Siebert, Luciano Cavalcante, Abbink, David A., Zgonnikov, Arkady

arXiv.org Artificial Intelligence

Understanding the causal influence of one agent on another agent is crucial for safely deploying artificially intelligent systems such as automated vehicles and mobile robots into human-inhabited environments. Existing models of causal responsibility deal with simplified abstractions of scenarios with discrete actions, thus, limiting real-world use when understanding responsibility in spatial interactions. Based on the assumption that spatially interacting agents are embedded in a scene and must follow an action at each instant, Feasible Action-Space Reduction (FeAR) was proposed as a metric for causal responsibility in a grid-world setting with discrete actions. Since real-world interactions involve continuous action spaces, this paper proposes a formulation of the FeAR metric for measuring causal responsibility in space-continuous interactions. We illustrate the utility of the metric in prototypical space-sharing conflicts, and showcase its applications for analysing backward-looking responsibility and in estimating forward-looking responsibility to guide agent decision making. Our results highlight the potential of the FeAR metric for designing and engineering artificial agents, as well as for assessing the responsibility of agents around humans.


GeoAI-Enhanced Community Detection on Spatial Networks with Graph Deep Learning

Liang, Yunlei, Zhu, Jiawei, Ye, Wen, Gao, Song

arXiv.org Artificial Intelligence

Spatial networks are useful for modeling geographic phenomena where spatial interaction plays an important role. To analyze the spatial networks and their internal structures, graph-based methods such as community detection have been widely used. Community detection aims to extract strongly connected components from the network and reveal the hidden relationships between nodes, but they usually do not involve the attribute information. To consider edge-based interactions and node attributes together, this study proposed a family of GeoAI-enhanced unsupervised community detection methods called region2vec based on Graph Attention Networks (GAT) and Graph Convolutional Networks (GCN). The region2vec methods generate node neural embeddings based on attribute similarity, geographic adjacency and spatial interactions, and then extract network communities based on node embeddings using agglomerative clustering. The proposed GeoAI-based methods are compared with multiple baselines and perform the best when one wants to maximize node attribute similarity and spatial interaction intensity simultaneously within the spatial network communities. It is further applied in the shortage area delineation problem in public health and demonstrates its promise in regionalization problems.


Graph Fourier Neural ODEs: Bridging Spatial and Temporal Multiscales in Molecular Dynamics

Sun, Fang, Huang, Zijie, Wang, Haixin, Cao, Yadi, Luo, Xiao, Wang, Wei, Sun, Yizhou

arXiv.org Artificial Intelligence

Molecular dynamics simulations are crucial for understanding complex physical, chemical, and biological processes at the atomic level. However, accurately capturing interactions across multiple spatial and temporal scales remains a significant challenge. We present a novel framework that jointly models spatial and temporal multiscale interactions in molecular dynamics. Our approach leverages Graph Fourier Transforms to decompose molecular structures into different spatial scales and employs Neural Ordinary Differential Equations to model the temporal dynamics in a curated manner influenced by the spatial modes. We evaluate our model on the MD17 dataset, demonstrating consistent performance improvements over state-of-the-art baselines across multiple molecules, particularly under challenging conditions such as irregular timestep sampling and long-term prediction horizons. Ablation studies confirm the significant contributions of both spatial and temporal multiscale modeling components.