Spatial Reasoning
Neural networks for geospatial data
Geostatistics, the analysis of geocoded data, is traditionally based on stochastic process models which offer a coherent way to model data at any finite collection of locations while ensuring the generalizability of inference to the entire region.Gaussian processes (GP) with a mean function capturing effects of covariates and the covariance function encoding the spatial dependence, is a staple for geostatistical analysis, offering theoretical guarantees and practical benefits. GP are flexible enough to model any smooth spatial surface, and can be specified parsimoniously with covariance functions using a very small set of parameters. The spatial covariance parameters offer insights into the smoothness and spatial properties of the response process (Stein, 1999). The finite dimensional realizations of a GP are multivariate Gaussian, thereby offering estimates of the mean and covariance parameters via convenient maximization of the Gaussian likelihood, and predictions at new locations by using conditional Gaussian distributions (see, e.g., Banerjee et al., 2014; Cressie and Wikle, 2015, for detailed exposition on GP models for spatial and spatio-temporal data). Also, computational roadblocks to using GP for large spatial data have been greatly mitigated by recent advances (see, Heaton et al., 2019, for a recent review of scalable GP approaches). The mean function of a Gaussian process is often modeled as a linear regression on the covariates. The growing popularity and accessibility of machine learning algorithms such as neural networks, random forests, gradient boosted trees, capable of modeling complex non-linear relationships has heralded a paradigm shift. Practitioners are increasingly shunning models with parametric assumptions like linearity in favor of these machine learning approaches that can capture non-linearity and high-order interactions in a data-driven manner. The field of spatial statistics has not been insulated from this machine learning revolution.
Loop Closure Detection Based on Object-level Spatial Layout and Semantic Consistency
Ji, Xingwu, Liu, Peilin, Niu, Haochen, Chen, Xiang, Ying, Rendong, Wen, Fei
Visual simultaneous localization and mapping (SLAM) systems face challenges in detecting loop closure under the circumstance of large viewpoint changes. In this paper, we present an object-based loop closure detection method based on the spatial layout and semanic consistency of the 3D scene graph. Firstly, we propose an object-level data association approach based on the semantic information from semantic labels, intersection over union (IoU), object color, and object embedding. Subsequently, multi-view bundle adjustment with the associated objects is utilized to jointly optimize the poses of objects and cameras. We represent the refined objects as a 3D spatial graph with semantics and topology. Then, we propose a graph matching approach to select correspondence objects based on the structure layout and semantic property similarity of vertices' neighbors. Finally, we jointly optimize camera trajectories and object poses in an object-level pose graph optimization, which results in a globally consistent map. Experimental results demonstrate that our proposed data association approach can construct more accurate 3D semantic maps, and our loop closure method is more robust than point-based and object-based methods in circumstances with large viewpoint changes.
WISK: A Workload-aware Learned Index for Spatial Keyword Queries
Sheng, Yufan, Cao, Xin, Fang, Yixiang, Zhao, Kaiqi, Qi, Jianzhong, Cong, Gao, Zhang, Wenjie
Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword queries are mostly built based on the geo-textual data without considering the distribution of queries already received. However, previous studies have shown that utilizing the known query distribution can improve the index structure for future query processing. In this paper, we propose WISK, a learned index for spatial keyword queries, which self-adapts for optimizing querying costs given a query workload. One key challenge is how to utilize both structured spatial attributes and unstructured textual information during learning the index. We first divide the data objects into partitions, aiming to minimize the processing costs of the given query workload. We prove the NP-hardness of the partitioning problem and propose a machine learning model to find the optimal partitions. Then, to achieve more pruning power, we build a hierarchical structure based on the generated partitions in a bottom-up manner with a reinforcement learning-based approach. We conduct extensive experiments on real-world datasets and query workloads with various distributions, and the results show that WISK outperforms all competitors, achieving up to 8x speedup in querying time with comparable storage overhead.
Road Network Representation Learning: A Dual Graph based Approach
Road network is a critical infrastructure powering many applications including transportation, mobility and logistics in real life. To leverage the input of a road network across these different applications, it is necessary to learn the representations of the roads in the form of vectors, which is named \emph{road network representation learning} (RNRL). While several models have been proposed for RNRL, they capture the pairwise relationships/connections among roads only (i.e., as a simple graph), and fail to capture among roads the high-order relationships (e.g., those roads that jointly form a local region usually have similar features such as speed limit) and long-range relationships (e.g., some roads that are far apart may have similar semantics such as being roads in residential areas). Motivated by this, we propose to construct a \emph{hypergraph}, where each hyperedge corresponds to a set of multiple roads forming a region. The constructed hypergraph would naturally capture the high-order relationships among roads with hyperedges. We then allow information propagation via both the edges in the simple graph and the hyperedges in the hypergraph in a graph neural network context. The graph reconstruction and hypergraph reconstruction tasks are conventional ones and can capture structural information. The hyperedge classification task can capture long-range relationships between pairs of roads that belong to hyperedges with the same label. We call the resulting model \emph{HyperRoad}. We further extend HyperRoad to problem settings when additional inputs of road attributes and/or trajectories that are generated on the roads are available.
EVKG: An Interlinked and Interoperable Electric Vehicle Knowledge Graph for Smart Transportation System
Qi, Yanlin, Mai, Gengchen, Zhu, Rui, Zhang, Michael
Over the past decade, the electric vehicle industry has experienced unprecedented growth and diversification, resulting in a complex ecosystem. To effectively manage this multifaceted field, we present an EV-centric knowledge graph (EVKG) as a comprehensive, cross-domain, extensible, and open geospatial knowledge management system. The EVKG encapsulates essential EV-related knowledge, including EV adoption, electric vehicle supply equipment, and electricity transmission network, to support decision-making related to EV technology development, infrastructure planning, and policy-making by providing timely and accurate information and analysis. To enrich and contextualize the EVKG, we integrate the developed EV-relevant ontology modules from existing well-known knowledge graphs and ontologies. This integration enables interoperability with other knowledge graphs in the Linked Data Open Cloud, enhancing the EVKG's value as a knowledge hub for EV decision-making. Using six competency questions, we demonstrate how the EVKG can be used to answer various types of EV-related questions, providing critical insights into the EV ecosystem. Our EVKG provides an efficient and effective approach for managing the complex and diverse EV industry. By consolidating critical EV-related knowledge into a single, easily accessible resource, the EVKG supports decision-makers in making informed choices about EV technology development, infrastructure planning, and policy-making. As a flexible and extensible platform, the EVKG is capable of accommodating a wide range of data sources, enabling it to evolve alongside the rapidly changing EV landscape.
DASS Good: Explainable Data Mining of Spatial Cohort Data
Wentzel, Andrew, Floricel, Carla, Canahuate, Guadalupe, Naser, Mohamed A., Mohamed, Abdallah S., Fuller, Clifton David, van Dijk, Lisanne, Marai, G. Elisabeta
Developing applicable clinical machine learning models is a difficult task when the data includes spatial information, for example, radiation dose distributions across adjacent organs at risk. We describe the co-design of a modeling system, DASS, to support the hybrid human-machine development and validation of predictive models for estimating long-term toxicities related to radiotherapy doses in head and neck cancer patients. Developed in collaboration with domain experts in oncology and data mining, DASS incorporates human-in-the-loop visual steering, spatial data, and explainable AI to augment domain knowledge with automatic data mining. We demonstrate DASS with the development of two practical clinical stratification models and report feedback from domain experts. Finally, we describe the design lessons learned from this collaborative experience.
Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images
Wu, Xindi, Lau, KwunFung, Ferroni, Francesco, Oลกep, Aljoลกa, Ramanan, Deva
Self-driving vehicles rely on urban street maps for autonomous navigation. In this paper, we introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images, as needed to continually update and expand existing maps. This is a challenging task, as we need to infer a complex urban road topology directly from raw image data. The main insight of this paper is that this problem can be posed as cross-modal retrieval by learning a joint, cross-modal embedding space for images and existing maps, represented as discrete graphs that encode the topological layout of the visual surroundings. We conduct our experimental evaluation using the Argoverse dataset and show that it is indeed possible to accurately retrieve street maps corresponding to both seen and unseen roads solely from image data. Moreover, we show that our retrieved maps can be used to update or expand existing maps and even show proof-of-concept results for visual localization and image retrieval from spatial graphs.
Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition
Xing, Jiazheng, Wang, Mengmeng, Liu, Yong, Mu, Boyu
Spatial and temporal modeling is one of the most core aspects of few-shot action recognition. Most previous works mainly focus on long-term temporal relation modeling based on high-level spatial representations, without considering the crucial low-level spatial features and short-term temporal relations. Actually, the former feature could bring rich local semantic information, and the latter feature could represent motion characteristics of adjacent frames, respectively. In this paper, we propose SloshNet, a new framework that revisits the spatial and temporal modeling for few-shot action recognition in a finer manner. First, to exploit the low-level spatial features, we design a feature fusion architecture search module to automatically search for the best combination of the low-level and high-level spatial features. Next, inspired by the recent transformer, we introduce a long-term temporal modeling module to model the global temporal relations based on the extracted spatial appearance features. Meanwhile, we design another short-term temporal modeling module to encode the motion characteristics between adjacent frame representations. After that, the final predictions can be obtained by feeding the embedded rich spatial-temporal features to a common frame-level class prototype matcher. We extensively validate the proposed SloshNet on four few-shot action recognition datasets, including Something-Something V2, Kinetics, UCF101, and HMDB51. It achieves favorable results against state-of-the-art methods in all datasets.
Spatial Representations in the Parietal Cortex May Use Basis Functions
The parietal cortex is thought to represent the egocentric posi(cid:173) tions of objects in particular coordinate systems. We propose an alternative approach to spatial perception of objects in the pari(cid:173) etal cortex from the perspective of sensorimotor transformations. The responses of single parietal neurons can be modeled as a gaus(cid:173) sian function of retinal position multiplied by a sigmoid function of eye position, which form a set of basis functions. We show here how these basis functions can be used to generate receptive fields in either retinotopic or head-centered coordinates by simple linear transformations. This raises the possibility that the parietal cortex does not attempt to compute the positions of objects in a partic(cid:173) ular frame of reference but instead computes a general purpose representation of the retinal location and eye position from which any transformation can be synthesized by direct projection.
A Model of Spatial Representations in Parietal Cortex Explains Hemineglect
We have recently developed a theory of spatial representations in which the position of an object is not encoded in a particular frame of reference but, instead, involves neurons computing basis func(cid:173) tions of their sensory inputs. This type of representation is able to perform nonlinear sensorimotor transformations and is consis(cid:173) tent with the response properties of parietal neurons. We now ask whether the same theory could account for the behavior of human patients with parietal lesions. These lesions induce a deficit known as hemineglect that is characterized by a lack of reaction to stimuli located in the hemispace contralateral to the lesion. A simulated lesion in a basis function representation was found to replicate three of the most important aspects of hemineglect: i) The models failed to cross the leftmost lines in line cancellation experiments, ii) the deficit affected multiple frames of reference and, iii) it could be object centered.