Spatial Reasoning
Sphere2Vec: A General-Purpose Location Representation Learning over a Spherical Surface for Large-Scale Geospatial Predictions
Mai, Gengchen, Xuan, Yao, Zuo, Wenyun, He, Yutong, Song, Jiaming, Ermon, Stefano, Janowicz, Krzysztof, Lao, Ni
Generating learning-friendly representations for points in space is a fundamental and long-standing problem in ML. Recently, multi-scale encoding schemes (such as Space2Vec and NeRF) were proposed to directly encode any point in 2D/3D Euclidean space as a high-dimensional vector, and has been successfully applied to various geospatial prediction and generative tasks. However, all current 2D and 3D location encoders are designed to model point distances in Euclidean space. So when applied to large-scale real-world GPS coordinate datasets, which require distance metric learning on the spherical surface, both types of models can fail due to the map projection distortion problem (2D) and the spherical-to-Euclidean distance approximation error (3D). To solve these problems, we propose a multi-scale location encoder called Sphere2Vec which can preserve spherical distances when encoding point coordinates on a spherical surface. We developed a unified view of distance-reserving encoding on spheres based on the DFS. We also provide theoretical proof that the Sphere2Vec preserves the spherical surface distance between any two points, while existing encoding schemes do not. Experiments on 20 synthetic datasets show that Sphere2Vec can outperform all baseline models on all these datasets with up to 30.8% error rate reduction. We then apply Sphere2Vec to three geo-aware image classification tasks - fine-grained species recognition, Flickr image recognition, and remote sensing image classification. Results on 7 real-world datasets show the superiority of Sphere2Vec over multiple location encoders on all three tasks. Further analysis shows that Sphere2Vec outperforms other location encoder models, especially in the polar regions and data-sparse areas because of its nature for spherical surface distance preservation. Code and data are available at https://gengchenmai.github.io/sphere2vec-website/.
S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture
Xue, Ye, Klabjan, Diego, Utke, Jean
Multimodal multitask learning has attracted an increasing interest in recent years. Singlemodal models have been advancing rapidly and have achieved astonishing results on various tasks across multiple domains. Multimodal learning offers opportunities for further improvements by integrating data from multiple modalities. Many methods are proposed to learn on a specific type of multimodal data, such as vision and language data. A few of them are designed to handle several modalities and tasks at a time. In this work, we extend and improve Omninet, an architecture that is capable of handling multiple modalities and tasks at a time, by introducing cross-cache attention, integrating patch embeddings for vision inputs, and supporting structured data. The proposed Structured-data-enhanced Omninet (S-Omninet) is a universal model that is capable of learning from structured data of various dimensions effectively with unstructured data through cross-cache attention, which enables interactions among spatial, temporal, and structured features. We also enhance spatial representations in a spatial cache with patch embeddings. We evaluate the proposed model on several multimodal datasets and demonstrate a significant improvement over the baseline, Omninet.
Exploring Spatial-Temporal Variations of Public Discourse on Social Media: A Case Study on the First Wave of the Coronavirus Pandemic in Italy
Michael, Anslow, Martina, Galletti
This paper proposes a methodology for exploring how linguistic behaviour on social media can be used to explore societal reactions to important events such as those that transpired during the SARS CoV2 pandemic. In particular, where spatial and temporal aspects of events are important features. Our methodology consists of grounding spatial-temporal categories in tweet usage trends using time-series analysis and clustering. Salient terms in each category were then identified through qualitative comparative analysis based on scaled f-scores aggregated into hand-coded categories. To exemplify this approach, we conducted a case study on the first wave of the coronavirus in Italy. We used our proposed methodology to explore existing psychological observations which claimed that physical distance from events affects what is communicated about them. We confirmed these findings by showing that the epicentre of the disease and peripheral regions correspond to clear time-series clusters and that those living in the epicentre of the SARS CoV2 outbreak were more focused on solidarity and policy than those from more peripheral regions. Furthermore, we also found that temporal categories corresponded closely to policy changes during the handling of the pandemic.
Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction
Guo, Aoqi, Wu, Junnan, Gao, Peng, Zhu, Wenbo, Guo, Qinwen, Gao, Dazhi, Wang, Yujun
Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input features and improve the estimation accuracy of the speech pre-separation module by avoiding information loss caused by direct dimensionality reduction in other models. Furthermore, we introduce a multi-head cross-attention mechanism that enhances the neural beamformer's perception of spatial information by making full use of the spatial information received by the array. Experimental results demonstrate that our approach, which incorporates a more reasonable target mask estimation network and a spatial information-based cross-attention mechanism into the neural beamformer, effectively improves speech separation performance.
Survey of Federated Learning Models for Spatial-Temporal Mobility Applications
Belal, Yacine, Mokhtar, Sonia Ben, Haddadi, Hamed, Wang, Jaron, Mashhadi, Afra
Spatial temporal mobility data collected by location-based services (LBS) [42] and other means such as Call Data Records (CDR), WiFi hotspots, smart watches, cars, etc. is very useful from a socio-economical perspective as it is at the heart of many useful applications (e.g., navigation, geo-located search, geo-located games) and it allows answering numerous societal research questions [51]. For example, Call Data Records have been successfully used to provide real-time traffic anomaly as well as event detection [90, 92], and a variety of mobility datasets have been used in shaping policies for urban communities [31] or epidemic management in the public health domain [80, 79]. From an individual-level perspective, users can benefit from personalized recommendations when they are encouraged to share their location data with third parties [22]. While there is no doubt about the usefulness of location-based applications, privacy concerns regarding the collection and sharing of individuals' mobility traces or aggregated flow of movements have prevented the data from being utilized to their full potential [87, 9, 53]. Indeed, various studies have shown that numerous threats are open if location data falls into the hands of inappropriate parties. These threats include re-identification [68], the inference of sensitive information about users [53, 94](e.g., their home and work locations, religious beliefs, political interests or sexual
Imitation with Spatial-Temporal Heatmap: 2nd Place Solution for NuPlan Challenge
Hu, Yihan, Li, Kun, Liang, Pingyuan, Qian, Jingyu, Yang, Zhening, Zhang, Haichao, Shao, Wenxin, Ding, Zhuangzhuang, Xu, Wei, Liu, Qiang
This paper presents our 2nd place solution for the NuPlan Challenge 2023. Autonomous driving in real-world scenarios is highly complex and uncertain. Achieving safe planning in the complex multimodal scenarios is a highly challenging task. Our approach, Imitation with Spatial-Temporal Heatmap, adopts the learning form of behavior cloning, innovatively predicts the future multimodal states with a heatmap representation, and uses trajectory refinement techniques to ensure final safety. The experiment shows that our method effectively balances the vehicle's progress and safety, generating safe and comfortable trajectories. In the NuPlan competition, we achieved the second highest overall score, while obtained the best scores in the ego progress and comfort metrics.
Spatio-temporal Diffusion Point Processes
Yuan, Yuan, Ding, Jingtao, Shao, Chenyang, Jin, Depeng, Li, Yong
Spatio-temporal point process (STPP) is a stochastic collection of events accompanied with time and space. Due to computational complexities, existing solutions for STPPs compromise with conditional independence between time and space, which consider the temporal and spatial distributions separately. The failure to model the joint distribution leads to limited capacities in characterizing the spatio-temporal entangled interactions given past events. In this work, we propose a novel parameterization framework for STPPs, which leverages diffusion models to learn complex spatio-temporal joint distributions. We decompose the learning of the target joint distribution into multiple steps, where each step can be faithfully described by a Gaussian distribution. To enhance the learning of each step, an elaborated spatio-temporal co-attention module is proposed to capture the interdependence between the event time and space adaptively. For the first time, we break the restrictions on spatio-temporal dependencies in existing solutions, and enable a flexible and accurate modeling paradigm for STPPs. Extensive experiments from a wide range of fields, such as epidemiology, seismology, crime, and urban mobility, demonstrate that our framework outperforms the state-of-the-art baselines remarkably, with an average improvement of over 50%. Further in-depth analyses validate its ability to capture spatio-temporal interactions, which can learn adaptively for different scenarios. The datasets and source code are available online: https://github.com/tsinghua-fib-lab/Spatio-temporal-Diffusion-Point-Processes.
Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation
Lin, Xingyu, Qi, Carl, Zhang, Yunchu, Huang, Zhiao, Fragkiadaki, Katerina, Li, Yunzhu, Gan, Chuang, Held, David
Abstract: Effective planning of long-horizon deformable object manipulation requires suitable abstractions at both the spatial and temporal levels. Previous methods typically either focus on short-horizon tasks or make the strong assumption that full-state information is available. However, full states of deformable objects are often unavailable. In this paper, we propose PlAnning with Spatial and Temporal Abstraction (PASTA), which incorporates both spatial abstraction (reasoning about objects and their relations to each other) and temporal abstraction (reasoning over skills instead of low-level actions). Our framework maps high-dimension 3D point clouds into a set of latent vectors and plans skill sequences with the latent set representation. Our method can solve challenging, novel sequential deformable object manipulation tasks in the real world, which require combining multiple tool-use skills such as cutting with a knife, pushing with a pusher, and spreading dough with a roller. Additional materials can be found on our project website.
Learned spatial data partitioning
Hori, Keizo, Sasaki, Yuya, Amagata, Daichi, Murosaki, Yuki, Onizuka, Makoto
Due to the significant increase in the size of spatial data, it is essential to use distributed parallel processing systems to efficiently analyze spatial data. In this paper, we first study learned spatial data partitioning, which effectively assigns groups of big spatial data to computers based on locations of data by using machine learning techniques. We formalize spatial data partitioning in the context of reinforcement learning and develop a novel deep reinforcement learning algorithm. Our learning algorithm leverages features of spatial data partitioning and prunes ineffective learning processes to find optimal partitions efficiently. Our experimental study, which uses Apache Sedona and real-world spatial data, demonstrates that our method efficiently finds partitions for accelerating distance join queries and reduces the workload run time by up to 59.4%.
Spatial-Temporal Graph Learning with Adversarial Contrastive Adaptation
Zhang, Qianru, Huang, Chao, Xia, Lianghao, Wang, Zheng, Yiu, Siuming, Han, Ruihua
Spatial-temporal graph learning has emerged as a promising solution for modeling structured spatial-temporal data and learning region representations for various urban sensing tasks such as crime forecasting and traffic flow prediction. However, most existing models are vulnerable to the quality of the generated region graph due to the inaccurate graph-structured information aggregation schema. The ubiquitous spatial-temporal data noise and incompleteness in real-life scenarios pose challenges in generating high-quality region representations. To address this challenge, we propose a new spatial-temporal graph learning model (GraphST) for enabling effective self-supervised learning. Our proposed model is an adversarial contrastive learning paradigm that automates the distillation of crucial multi-view self-supervised information for robust spatial-temporal graph augmentation. We empower GraphST to adaptively identify hard samples for better self-supervision, enhancing the representation discrimination ability and robustness. In addition, we introduce a cross-view contrastive learning paradigm to model the inter-dependencies across view-specific region representations and preserve underlying relation heterogeneity. We demonstrate the superiority of our proposed GraphST method in various spatial-temporal prediction tasks on real-life datasets. We release our model implementation via the link: \url{https://github.com/HKUDS/GraphST}.