Spatial Reasoning
Qualitative Reasoning about 2D Cardinal Directions using Answer Set Programming
Izmirlioglu, Yusuf (a:1:{s:5:"en_US";s:18:"Sabanci University";}) | Erdem, Esra (Sabanci University, Faculty of Engineering and Natural Sciences)
We introduce a formal framework (called NCDC-ASP) for representing and reasoning about cardinal directions between extended spatial objects on a plane, using Answer Set Programming (ASP). NCDC-ASP preserves the meaning of cardinal directional relations as in Cardinal Directional Calculus (CDC), and provides solutions to all consistency checking problems in CDC under various conditions (i.e., for a complete/incomplete set of basic/disjunctive CDC constraints over connected/disconnected spatial objects). In particular, NCDC-ASP models a discretized version of the consistency checking problem in ASP, over a finite grid (rather than a plane), where we provide new lower bounds on the grid size to guarantee that it correctly characterizes solutions for the consistency checking in CDC. In addition, NCDC-ASP has the following two novelties important for applications. NCDC-ASP introduces default CDC constraints to represent and reason about background or commonsense knowledge that involves default qualitative directional relations (e.g., "the ice cream truck is by default to the north of the playground" or "the keyboard is normally placed in front of the monitor"). NCDC-ASP introduces inferred CDC constraints to allow inference of missing CDC relations and to provide them as explanations. We illustrate the uses and usefulness of NCDC-ASP with interesting scenarios from the real-world. We design and develop a variety of benchmark instances, and comprehensively evaluate NCDC-ASP from the perspectives of computational efficiency.
Spatio-Temporal EEG Representation Learning on Riemannian Manifold and Euclidean Space
We present a novel deep neural architecture for learning electroencephalogram (EEG). To learn the spatial information, our model first obtains the Riemannian mean and distance from spatial covariance matrices (SCMs) on a Riemannian manifold. We then project the spatial information onto a Euclidean space via tangent space learning. Following, two fully connected layers are used to learn the spatial information embeddings. Moreover, our proposed method learns the temporal information via differential entropy and logarithm power spectrum density features extracted from EEG signals in a Euclidean space using a deep long short-term memory network with a soft attention mechanism. To combine the spatial and temporal information, we use an effective fusion strategy, which learns attention weights applied to embedding-specific features for decision making. We evaluate our proposed framework on four public datasets across three popular EEG-related tasks, notably emotion recognition, vigilance estimation, and motor imagery classification, containing various types of tasks such as binary classification, multi-class classification, and regression. Our proposed architecture outperforms other methods on SEED-VIG, and approaches the state-of-the-art on the other three datasets (SEED, BCI-IV 2A, and BCI-IV 2B), showing the robustness of our framework in EEG representation learning. The source code of our paper is publicly available at https://github.com/guangyizhangbci/EEG_Riemannian.
A $2,490 wireless podcasting kit is absurd, but at least Nomono's sounds great
That was my first thought when I learned about Nomono's portable recording setup last year. Since then, the price has dropped slightly to $2,490. But my main concern remains: Who needs this thing when you can get a small audio recorder and lavalier mics for far less? Nomono's kit includes four wireless lav mics, as well as a space recorder for room tone and spatial data, all of which charge in a portable carrying case. Your recordings are automatically uploaded to Nomono's cloud service, where you can optimize their audio quality (and eventually edit them in your browser).
Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities
Yang, Hanchen, Li, Wengen, Wang, Shuyu, Li, Hui, Guan, Jihong, Zhou, Shuigeng, Cao, Jiannong
With the rapid amassing of spatial-temporal (ST) ocean data, many spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, including climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated but with unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models on ST ocean data. To the best of our knowledge, a comprehensive survey of existing studies remains missing in the literature, which hinders not only computer scientists from identifying the research issues in ocean data mining but also ocean scientists to apply advanced STDM techniques. In this paper, we provide a comprehensive survey of existing STDM studies for ocean science. Concretely, we first review the widely-used ST ocean datasets and highlight their unique characteristics. Then, typical ST ocean data quality enhancement techniques are explored. Next, we classify existing STDM studies in ocean science into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate on the techniques for these tasks. Finally, promising research opportunities are discussed. This survey can help scientists from both computer science and ocean science better understand the fundamental concepts, key techniques, and open challenges of STDM for ocean science.
Poly-MOT: A Polyhedral Framework For 3D Multi-Object Tracking
Li, Xiaoyu, Xie, Tao, Liu, Dedong, Gao, Jinghan, Dai, Kun, Jiang, Zhiqiang, Zhao, Lijun, Wang, Ke
3D Multi-object tracking (MOT) empowers mobile robots to accomplish well-informed motion planning and navigation tasks by providing motion trajectories of surrounding objects. However, existing 3D MOT methods typically employ a single similarity metric and physical model to perform data association and state estimation for all objects. With large-scale modern datasets and real scenes, there are a variety of object categories that commonly exhibit distinctive geometric properties and motion patterns. In this way, such distinctions would enable various object categories to behave differently under the same standard, resulting in erroneous matches between trajectories and detections, and jeopardizing the reliability of downstream tasks (navigation, etc.). Towards this end, we propose Poly-MOT, an efficient 3D MOT method based on the Tracking-By-Detection framework that enables the tracker to choose the most appropriate tracking criteria for each object category. Specifically, Poly-MOT leverages different motion models for various object categories to characterize distinct types of motion accurately. We also introduce the constraint of the rigid structure of objects into a specific motion model to accurately describe the highly nonlinear motion of the object. Additionally, we introduce a two-stage data association strategy to ensure that objects can find the optimal similarity metric from three custom metrics for their categories and reduce missing matches. On the NuScenes dataset, our proposed method achieves state-of-the-art performance with 75.4\% AMOTA. The code is available at https://github.com/lixiaoyu2000/Poly-MOT
Dominating Set Database Selection for Visual Place Recognition
Kornilova, Anastasiia, Moskalenko, Ivan, Pushkin, Timofei, Tojiboev, Fakhriddin, Tariverdizadeh, Rahim, Ferrer, Gonzalo
This paper presents an approach for creating a visual place recognition (VPR) database for localization in indoor environments from RGBD scanning sequences. The proposed approach is formulated as a minimization problem in terms of dominating set algorithm for graph, constructed from spatial information, and referred as DominatingSet. Our algorithm shows better scene coverage in comparison to other methodologies that are used for database creation. Also, we demonstrate that using DominatingSet, a database size could be up to 250-1400 times smaller than the original scanning sequence while maintaining a recall rate of more than 80% on testing sequences. We evaluated our algorithm on 7-scenes and BundleFusion datasets and an additionally recorded sequence in a highly repetitive office setting. In addition, the database selection can produce weakly-supervised labels for fine-tuning neural place recognition algorithms to particular settings, improving even more their accuracy. The paper also presents a fully automated pipeline for VPR database creation from RGBD scanning sequences, as well as a set of metrics for VPR database evaluation. The code and released data are available on our web-page~ -- https://prime-slam.github.io/place-recognition-db/
A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices
Li, Jinyang, Wang, Zhepeng, Hu, Zhirui, Date, Prasanna, Li, Ang, Jiang, Weiwen
Quantum computing presents a promising approach for machine learning with its capability for extremely parallel computation in high-dimension through superposition and entanglement. Despite its potential, existing quantum learning algorithms, such as Variational Quantum Circuits(VQCs), face challenges in handling more complex datasets, particularly those that are not linearly separable. What's more, it encounters the deployability issue, making the learning models suffer a drastic accuracy drop after deploying them to the actual quantum devices. To overcome these limitations, this paper proposes a novel spatial-temporal design, namely ST-VQC, to integrate non-linearity in quantum learning and improve the robustness of the learning model to noise. Specifically, ST-VQC can extract spatial features via a novel block-based encoding quantum sub-circuit coupled with a layer-wise computation quantum sub-circuit to enable temporal-wise deep learning. Additionally, a SWAP-Free physical circuit design is devised to improve robustness. These designs bring a number of hyperparameters. After a systematic analysis of the design space for each design component, an automated optimization framework is proposed to generate the ST-VQC quantum circuit. The proposed ST-VQC has been evaluated on two IBM quantum processors, ibm_cairo with 27 qubits and ibmq_lima with 7 qubits to assess its effectiveness. The results of the evaluation on the standard dataset for binary classification show that ST-VQC can achieve over 30% accuracy improvement compared with existing VQCs on actual quantum computers. Moreover, on a non-linear synthetic dataset, the ST-VQC outperforms a linear classifier by 27.9%, while the linear classifier using classical computing outperforms the existing VQC by 15.58%.
PGCN: Progressive Graph Convolutional Networks for Spatial-Temporal Traffic Forecasting
The complex spatial-temporal correlations in transportation networks make the traffic forecasting problem challenging. Since transportation system inherently possesses graph structures, much research efforts have been put with graph neural networks. Recently, constructing adaptive graphs to the data has shown promising results over the models relying on a single static graph structure. However, the graph adaptations are applied during the training phases, and do not reflect the data used during the testing phases. Such shortcomings can be problematic especially in traffic forecasting since the traffic data often suffers from the unexpected changes and irregularities in the time series. In this study, we propose a novel traffic forecasting framework called Progressive Graph Convolutional Network (PGCN). PGCN constructs a set of graphs by progressively adapting to input data during the training and the testing phases. Specifically, we implemented the model to construct progressive adjacency matrices by learning trend similarities among graph nodes. Then, the model is combined with the dilated causal convolution and gated activation unit to extract temporal features. With residual and skip connections, PGCN performs the traffic prediction. When applied to four real-world traffic datasets of diverse geometric nature, the proposed model achieves state-of-the-art performance with consistency in all datasets. We conclude that the ability of PGCN to progressively adapt to input data enables the model to generalize in different study sites with robustness.
Bivariate DeepKriging for Large-scale Spatial Interpolation of Wind Fields
Nag, Pratik, Sun, Ying, Reich, Brian J
High spatial resolution wind data are essential for a wide range of applications in climate, oceanographic and meteorological studies. Large-scale spatial interpolation or downscaling of bivariate wind fields having velocity in two dimensions is a challenging task because wind data tend to be non-Gaussian with high spatial variability and heterogeneity. In spatial statistics, cokriging is commonly used for predicting bivariate spatial fields. However, the cokriging predictor is not optimal except for Gaussian processes. Additionally, cokriging is computationally prohibitive for large datasets. In this paper, we propose a method, called bivariate DeepKriging, which is a spatially dependent deep neural network (DNN) with an embedding layer constructed by spatial radial basis functions for bivariate spatial data prediction. We then develop a distribution-free uncertainty quantification method based on bootstrap and ensemble DNN. Our proposed approach outperforms the traditional cokriging predictor with commonly used covariance functions, such as the linear model of co-regionalization and flexible bivariate Mat\'ern covariance. We demonstrate the computational efficiency and scalability of the proposed DNN model, with computations that are, on average, 20 times faster than those of conventional techniques. We apply the bivariate DeepKriging method to the wind data over the Middle East region at 506,771 locations. The prediction performance of the proposed method is superior over the cokriging predictors and dramatically reduces computation time.
One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
Yang, Siyuan, Liu, Jun, Lu, Shijian, Hwa, Er Meng, Kot, Alex C.
One-shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample, has attracted increasing interest due to the challenge of collecting and annotating large-scale skeleton action data. However, most existing studies match skeleton sequences by comparing their feature vectors directly which neglects spatial structures and temporal orders of skeleton data. This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching. We represent skeleton data at multiple spatial and temporal scales and achieve optimal feature matching from two perspectives. The first is multi-scale matching which captures the scale-wise semantic relevance of skeleton data at multiple spatial and temporal scales simultaneously. The second is cross-scale matching which handles different motion magnitudes and speeds by capturing sample-wise relevance across multiple scales. Extensive experiments over three large-scale datasets (NTU RGB+D, NTU RGB+D 120, and PKU-MMD) show that our method achieves superior one-shot skeleton action recognition, and it outperforms the state-of-the-art consistently by large margins.