Spatial Reasoning
Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition
Dong, Yihang, Chen, Xuhang, Shen, Yanyan, Ng, Michael Kwok-Po, Qian, Tao, Wang, Shuqiang
Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing.
A Planet Scale Spatial-Temporal Knowledge Graph Based On OpenStreetMap And H3 Grid
Bรถckling, Martin, Paulheim, Heiko, Detzler, Sarah
Geospatial data plays a central role in modeling our world, for which OpenStreetMap (OSM) provides a rich source of such data. While often spatial data is represented in a tabular format, a graph based representation provides the possibility to interconnect entities which would have been separated in a tabular representation. We propose in our paper a framework which supports a planet scale transformation of OpenStreetMap data into a Spatial Temporal Knowledge Graph. In addition to OpenStreetMap data, we align the different OpenStreetMap geometries on individual h3 grid cells. We compare our constructed spatial knowledge graph to other spatial knowledge graphs and outline our contribution in this paper. As a basis for our computation, we use Apache Sedona as a computational framework for our Spatial Temporal Knowledge Graph construction
Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Wu, Wenshan, Mao, Shaoguang, Zhang, Yadong, Xia, Yan, Dong, Li, Cui, Lei, Wei, Furu
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks. However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored. Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind's Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. VoT aims to elicit spatial reasoning of LLMs by visualizing their reasoning traces, thereby guiding subsequent reasoning steps. We employed VoT for multi-hop spatial reasoning tasks, including natural language navigation, visual navigation, and visual tiling in 2D grid worlds. Experimental results demonstrated that VoT significantly enhances the spatial reasoning abilities of LLMs. Notably, VoT outperformed existing multimodal large language models (MLLMs) in these tasks. While VoT works surprisingly well on LLMs, the ability to generate mental images to facilitate spatial reasoning resembles the mind's eye process, suggesting its potential viability in MLLMs.
Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning
Li, Fangjun, Hogg, David C., Cohn, Anthony G.
Spatial reasoning plays a vital role in both human cognition and machine intelligence, prompting new research into language models' (LMs) capabilities in this regard. However, existing benchmarks reveal shortcomings in evaluating qualitative spatial reasoning (QSR). These benchmarks typically present oversimplified scenarios or unclear natural language descriptions, hindering effective evaluation. We present a novel benchmark for assessing QSR in LMs, which is grounded in realistic 3D simulation data, offering a series of diverse room layouts with various objects and their spatial relationships. This approach provides a more detailed and context-rich narrative for spatial reasoning evaluation, diverging from traditional, toy-task-oriented scenarios. Our benchmark encompasses a broad spectrum of qualitative spatial relationships, including topological, directional, and distance relations. These are presented with different viewing points, varied granularities, and density of relation constraints to mimic real-world complexities. A key contribution is our logic-based consistency-checking tool, which enables the assessment of multiple plausible solutions, aligning with real-world scenarios where spatial relationships are often open to interpretation. Our benchmark evaluation of advanced LMs reveals their strengths and limitations in spatial reasoning. They face difficulties with multi-hop spatial reasoning and interpreting a mix of different view descriptions, pointing to areas for future improvement.
Learning Geospatial Region Embedding with Heterogeneous Graph
Zou, Xingchen, Huang, Jiani, Hao, Xixuan, Yang, Yuhao, Wen, Haomin, Yan, Yibo, Huang, Chao, Liang, Yuxuan
Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency of effective intra-region feature representation; and second, the difficulty of learning from intricate inter-region dependencies. In this paper, we present GeoHG, an effective heterogeneous graph structure for learning comprehensive region embeddings for various downstream tasks. Specifically, we tailor satellite image representation learning through geo-entity segmentation and point-of-interest (POI) integration for expressive intra-regional features. Furthermore, GeoHG unifies informative spatial interdependencies and socio-environmental attributes into a powerful heterogeneous graph to encourage explicit modeling of higher-order inter-regional relationships. The intra-regional features and inter-regional correlations are seamlessly integrated by a model-agnostic graph learning framework for diverse downstream tasks. Extensive experiments demonstrate the effectiveness of GeoHG in geo-prediction tasks compared to existing methods, even under extreme data scarcity (with just 5% of training data). With interpretable region representations, GeoHG exhibits strong generalization capabilities across regions. We will release code and data upon paper notification.
Traffic Scenario Logic: A Spatial-Temporal Logic for Modeling and Reasoning of Urban Traffic Scenarios
Wang, Ruolin, Xu, Yuejiao, Ji, Jianmin
Formal representations of traffic scenarios can be used to generate test cases for the safety verification of autonomous driving. However, most existing methods are limited in highway or highly simplified intersection scenarios due to the intricacy and diversity of traffic scenarios. In response, we propose Traffic Scenario Logic (TSL), which is a spatial-temporal logic designed for modeling and reasoning of urban pedestrian-free traffic scenarios. TSL provides a formal representation of the urban road network that can be derived from OpenDRIVE, i.e., the de facto industry standard of high-definition maps for autonomous driving, enabling the representation of a broad range of traffic scenarios. We implemented the reasoning of TSL using Telingo, i.e., a solver for temporal programs based on the Answer Set Programming, and tested it on different urban road layouts. Demonstrations show the effectiveness of TSL in test scenario generation and its potential value in areas like decision-making and control verification of autonomous driving.
FedASTA: Federated adaptive spatial-temporal attention for traffic flow prediction
Li, Kaiyuan, Zhang, Yihan, Chen, Xinlei
Mobile devices and the Internet of Things (IoT) devices nowadays generate a large amount of heterogeneous spatial-temporal data. It remains a challenging problem to model the spatial-temporal dynamics under privacy concern. Federated learning (FL) has been proposed as a framework to enable model training across distributed devices without sharing original data which reduce privacy concern. Personalized federated learning (PFL) methods further address data heterogenous problem. However, these methods don't consider natural spatial relations among nodes. For the sake of modeling spatial relations, Graph Neural Netowork (GNN) based FL approach have been proposed. But dynamic spatial-temporal relations among edge nodes are not taken into account. Several approaches model spatial-temporal dynamics in a centralized environment, while less effort has been made under federated setting. To overcome these challeges, we propose a novel Federated Adaptive Spatial-Temporal Attention (FedASTA) framework to model the dynamic spatial-temporal relations. On the client node, FedASTA extracts temporal relations and trend patterns from the decomposed terms of original time series. Then, on the server node, FedASTA utilize trend patterns from clients to construct adaptive temporal-spatial aware graph which captures dynamic correlation between clients. Besides, we design a masked spatial attention module with both static graph and constructed adaptive graph to model spatial dependencies among clients. Extensive experiments on five real-world public traffic flow datasets demonstrate that our method achieves state-of-art performance in federated scenario. In addition, the experiments made in centralized setting show the effectiveness of our novel adaptive graph construction approach compared with other popular dynamic spatial-temporal aware methods.
CReMa: Crisis Response through Computational Identification and Matching of Cross-Lingual Requests and Offers Shared on Social Media
Lamsal, Rabindra, Read, Maria Rodriguez, Karunasekera, Shanika, Imran, Muhammad
During times of crisis, social media platforms play a vital role in facilitating communication and coordinating resources. Amidst chaos and uncertainty, communities often rely on these platforms to share urgent pleas for help, extend support, and organize relief efforts. However, the sheer volume of conversations during such periods, which can escalate to unprecedented levels, necessitates the automated identification and matching of requests and offers to streamline relief operations. This study addresses the challenge of efficiently identifying and matching assistance requests and offers on social media platforms during emergencies. We propose CReMa (Crisis Response Matcher), a systematic approach that integrates textual, temporal, and spatial features for multi-lingual request-offer matching. By leveraging CrisisTransformers, a set of pre-trained models specific to crises, and a cross-lingual embedding space, our methodology enhances the identification and matching tasks while outperforming strong baselines such as RoBERTa, MPNet, and BERTweet, in classification tasks, and Universal Sentence Encoder, Sentence Transformers in crisis embeddings generation tasks. We introduce a novel multi-lingual dataset that simulates scenarios of help-seeking and offering assistance on social media across the 16 most commonly used languages in Australia. We conduct comprehensive cross-lingual experiments across these 16 languages, also while examining trade-offs between multiple vector search strategies and accuracy. Additionally, we analyze a million-scale geotagged global dataset to comprehend patterns in relation to seeking help and offering assistance on social media. Overall, these contributions advance the field of crisis informatics and provide benchmarks for future research in the area.
STG-Mamba: Spatial-Temporal Graph Learning via Selective State Space Model
Li, Lincan, Wang, Hanchen, Zhang, Wenjie, Coster, Adelle
Spatial-Temporal Graph (STG) data is characterized as dynamic, heterogenous, and non-stationary, leading to the continuous challenge of spatial-temporal graph learning. In the past few years, various GNN-based methods have been proposed to solely focus on mimicking the relationships among node individuals of the STG network, ignoring the significance of modeling the intrinsic features that exist in STG system over time. In contrast, modern Selective State Space Models (SSSMs) present a new approach which treat STG Network as a system, and meticulously explore the STG system's dynamic state evolution across temporal dimension. In this work, we introduce Spatial-Temporal Graph Mamba (STG-Mamba) as the first exploration of leveraging the powerful selective state space models for STG learning by treating STG Network as a system, and employing the Spatial-Temporal Selective State Space Module (ST-S3M) to precisely focus on the selected STG latent features. Furthermore, to strengthen GNN's ability of modeling STG data under the setting of selective state space models, we propose Kalman Filtering Graph Neural Networks (KFGN) for dynamically integrate and upgrade the STG embeddings from different temporal granularities through a learnable Kalman Filtering statistical theory-based approach. Extensive empirical studies are conducted on three benchmark STG forecasting datasets, demonstrating the performance superiority and computational efficiency of STG-Mamba. It not only surpasses existing state-of-the-art methods in terms of STG forecasting performance, but also effectively alleviate the computational bottleneck of large-scale graph networks in reducing the computational cost of FLOPs and test inference time. The implementation code is available at: \url{https://github.com/LincanLi98/STG-Mamba}.
ST-Mamba: Spatial-Temporal Selective State Space Model for Traffic Flow Prediction
Shao, Zhiqi, Bell, Michael G. H., Wang, Ze, Geers, D. Glenn, Xi, Haoning, Gao, Junbin
Traffic flow prediction, a critical aspect of intelligent transportation systems, has been increasingly popular in the field of artificial intelligence, driven by the availability of extensive traffic data. The current challenges of traffic flow prediction lie in integrating diverse factors while balancing the trade-off between computational complexity and the precision necessary for effective long-range and large-scale predictions. To address these challenges, we introduce a Spatial-Temporal Selective State Space (ST-Mamba) model, which is the first to leverage the power of spatial-temporal learning in traffic flow prediction without using graph modeling. The ST-Mamba model can effectively capture the long-range dependency for traffic flow data, thereby avoiding the issue of over-smoothing. The proposed ST-Mamba model incorporates an effective Spatial-Temporal Mixer (ST-Mixer) to seamlessly integrate spatial and temporal data processing into a unified framework and employs a Spatial-Temporal Selective State Space (ST-SSM) block to improve computational efficiency. The proposed ST-Mamba model, specifically designed for spatial-temporal data, simplifies processing procedure and enhances generalization capabilities, thereby significantly improving the accuracy of long-range traffic flow prediction. Compared to the previous state-of-the-art (SOTA) model, the proposed ST-Mamba model achieves a 61.11\% improvement in computational speed and increases prediction accuracy by 0.67\%. Extensive experiments with real-world traffic datasets demonstrate that the \textsf{ST-Mamba} model sets a new benchmark in traffic flow prediction, achieving SOTA performance in computational efficiency for both long- and short-range predictions and significantly improving the overall efficiency and effectiveness of traffic management.