Goto

Collaborating Authors

 Spatial Reasoning


Latent Graph Attention for Enhanced Spatial Context

arXiv.org Artificial Intelligence

Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.


Transaction Fraud Detection via Spatial-Temporal-Aware Graph Transformer

arXiv.org Artificial Intelligence

How to obtain informative representations of transactions and then perform the identification of fraudulent transactions is a crucial part of ensuring financial security. Recent studies apply Graph Neural Networks (GNNs) to the transaction fraud detection problem. Nevertheless, they encounter challenges in effectively learning spatial-temporal information due to structural limitations. Moreover, few prior GNN-based detectors have recognized the significance of incorporating global information, which encompasses similar behavioral patterns and offers valuable insights for discriminative representation learning. Therefore, we propose a novel heterogeneous graph neural network called Spatial-Temporal-Aware Graph Transformer (STA-GT) for transaction fraud detection problems. Specifically, we design a temporal encoding strategy to capture temporal dependencies and incorporate it into the graph neural network framework, enhancing spatial-temporal information modeling and improving expressive ability. Furthermore, we introduce a transformer module to learn local and global information. Pairwise node-node interactions overcome the limitation of the GNN structure and build up the interactions with the target node and long-distance ones. Experimental results on two financial datasets compared to general GNN models and GNN-based fraud detectors demonstrate that our proposed method STA-GT is effective on the transaction fraud detection task.


Kernel Estimates as General Concept for the Measuring of Pedestrian Density

arXiv.org Artificial Intelligence

The standard definition of pedestrian density produces scattered values, hence, many approaches have been developed to improve the features of the estimated density. This paper provides a review of generally applied methods and presents a general framework based on various kernels that bring desired properties of density estimates (e.g., continuity) and incorporate ordinarily used methods. The developed kernel concept considers each pedestrian as a source of density distribution, parametrized by the kernel type (e.g., Gauss, cone) and kernel size. The quantitative parametric study performed on experimental data illustrates that parametrization brings desired features, for instance, a conic kernel with a base radius in (0.7, 1.2) m produces smooth values that retain trend features. The correspondence between kernel and non-kernel methods (namely Voronoi diagram and customized inverse distance to the nearest pedestrian) is achievable for a wide range of kernel parameter. Thereby the generality of the concept is supported.


What Can Algebraic Topology and Differential Geometry Teach Us About Intrinsic Dynamics and Global Behavior of Robots?

arXiv.org Artificial Intelligence

Traditionally, robots are regarded as universal motion generation machines. They are designed mainly by kinematics considerations while the desired dynamics is imposed by strong actuators and high-rate control loops. As an alternative, one can first consider the robot's intrinsic dynamics and optimize it in accordance with the desired tasks. Therefore, one needs to better understand intrinsic, uncontrolled dynamics of robotic systems. In this paper we focus on periodic orbits, as fundamental dynamic properties with many practical applications. Algebraic topology and differential geometry provide some fundamental statements about existence of periodic orbits. As an example, we present periodic orbits of the simplest multi-body system: the double-pendulum in gravity. This simple system already displays a rich variety of periodic orbits. We classify these into three classes: toroidal orbits, disk orbits and nonlinear normal modes. Some of these we found by geometrical insights and some by numerical simulation and sampling.


SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space

arXiv.org Artificial Intelligence

While many natural language inference (NLI) datasets target certain semantic phenomena, e.g., negation, tense & aspect, monotonicity, and presupposition, to the best of our knowledge, there is no NLI dataset that involves diverse types of spatial expressions and reasoning. We fill this gap by semi-automatically creating an NLI dataset for spatial reasoning, called SpaceNLI. The data samples are automatically generated from a curated set of reasoning patterns, where the patterns are annotated with inference labels by experts. We test several SOTA NLI systems on SpaceNLI to gauge the complexity of the dataset and the system's capacity for spatial reasoning. Moreover, we introduce a Pattern Accuracy and argue that it is a more reliable and stricter measure than the accuracy for evaluating a system's performance on pattern-based generated data samples. Based on the evaluation results we find that the systems obtain moderate results on the spatial NLI problems but lack consistency per inference pattern. The results also reveal that non-projective spatial inferences (especially due to the "between" preposition) are the most challenging ones.


Cross-Camera Trajectories Help Person Retrieval in a Camera Network

arXiv.org Artificial Intelligence

We are concerned with retrieving a query person from multiple videos captured by a non-overlapping camera network. Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network. To address this issue, we propose a pedestrian retrieval framework based on cross-camera trajectory generation, which integrates both temporal and spatial information. To obtain pedestrian trajectories, we propose a novel cross-camera spatio-temporal model that integrates pedestrians' walking habits and the path layout between cameras to form a joint probability distribution. Such a spatio-temporal model among a camera network can be specified using sparsely sampled pedestrian data. Based on the spatio-temporal model, cross-camera trajectories can be extracted by the conditional random field model and further optimized by restricted non-negative matrix factorization. Finally, a trajectory re-ranking technique is proposed to improve the pedestrian retrieval results. To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset, the Person Trajectory Dataset, in real surveillance scenarios. Extensive experiments verify the effectiveness and robustness of the proposed method.


The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps

arXiv.org Artificial Intelligence

Scanned historical maps in libraries and archives are valuable repositories of geographic data that often do not exist elsewhere. Despite the potential of machine learning tools like the Google Vision APIs for automatically transcribing text from these maps into machine-readable formats, they do not work well with large-sized images (e.g., high-resolution scanned documents), cannot infer the relation between the recognized text and other datasets, and are challenging to integrate with post-processing tools. This paper introduces the mapKurator system, an end-to-end system integrating machine learning models with a comprehensive data processing pipeline. mapKurator empowers automated extraction, post-processing, and linkage of text labels from large numbers of large-dimension historical map scans. The output data, comprising bounding polygons and recognized text, is in the standard GeoJSON format, making it easily modifiable within Geographic Information Systems (GIS). The proposed system allows users to quickly generate valuable data from large numbers of historical maps for in-depth analysis of the map content and, in turn, encourages map findability, accessibility, interoperability, and reusability (FAIR principles). We deployed the mapKurator system and enabled the processing of over 60,000 maps and over 100 million text/place names in the David Rumsey Historical Map collection. We also demonstrated a seamless integration of mapKurator with a collaborative web platform to enable accessing automated approaches for extracting and linking text labels from historical map scans and collective work to improve the results.


DSTCGCN: Learning Dynamic Spatial-Temporal Cross Dependencies for Traffic Forecasting

arXiv.org Artificial Intelligence

Traffic forecasting is essential to intelligent transportation systems, which is challenging due to the complicated spatial and temporal dependencies within a road network. Existing works usually learn spatial and temporal dependencies separately, ignoring the dependencies crossing spatial and temporal dimensions. In this paper, we propose DSTCGCN, a dynamic spatial-temporal cross graph convolution network to learn dynamic spatial and temporal dependencies jointly via graphs for traffic forecasting. Specifically, we introduce a fast Fourier transform (FFT) based attentive selector to choose relevant time steps for each time step based on time-varying traffic data. Given the selected time steps, we introduce a dynamic cross graph construction module, consisting of the spatial graph construction, temporal connection graph construction, and fusion modules, to learn dynamic spatial-temporal cross dependencies without pre-defined priors. Extensive experiments on six real-world datasets demonstrate that DSTCGCN achieves the state-of-the-art performance.


HeGeL: A Novel Dataset for Geo-Location from Hebrew Text

arXiv.org Artificial Intelligence

The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning. Even though there are quite a few datasets in English used for geolocation, they are currently based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, such that the location retrieval resolution is limited. Furthermore, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.


STG4Traffic: A Survey and Benchmark of Spatial-Temporal Graph Neural Networks for Traffic Prediction

arXiv.org Artificial Intelligence

Traffic prediction has been an active research topic in the domain of spatial-temporal data mining. Accurate real-time traffic prediction is essential to improve the safety, stability, and versatility of smart city systems, i.e., traffic control and optimal routing. The complex and highly dynamic spatial-temporal dependencies make effective predictions still face many challenges. Recent studies have shown that spatial-temporal graph neural networks exhibit great potential applied to traffic prediction, which combines sequential models with graph convolutional networks to jointly model temporal and spatial correlations. However, a survey study of graph learning, spatial-temporal graph models for traffic, as well as a fair comparison of baseline models are pending and unavoidable issues. In this paper, we first provide a systematic review of graph learning strategies and commonly used graph convolution algorithms. Then we conduct a comprehensive analysis of the strengths and weaknesses of recently proposed spatial-temporal graph network models. Furthermore, we build a study called STG4Traffic using the deep learning framework PyTorch to establish a standardized and scalable benchmark on two types of traffic datasets. We can evaluate their performance by personalizing the model settings with uniform metrics. Finally, we point out some problems in the current study and discuss future directions. Source codes are available at https://github.com/trainingl/STG4Traffic.