Goto

Collaborating Authors

 spatial distance


End-to-end Autonomous Vehicle Following System using Monocular Fisheye Camera

Zhang, Jiale, Qian, Yeqiang, Qin, Tong, Jiang, Mingyang, Chen, Siyuan, Yang, Ming

arXiv.org Artificial Intelligence

The increase in vehicle ownership has led to increased traffic congestion, more accidents, and higher carbon emissions. Vehicle platooning is a promising solution to address these issues by improving road capacity and reducing fuel consumption. However, existing platooning systems face challenges such as reliance on lane markings and expensive high-precision sensors, which limits their general applicability. To address these issues, we propose a vehicle following framework that expands its capability from restricted scenarios to general scenario applications using only a camera. This is achieved through our newly proposed end-to-end method, which improves overall driving performance. The method incorporates a semantic mask to address causal confusion in multi-frame data fusion. Additionally, we introduce a dynamic sampling mechanism to precisely track the trajectories of preceding vehicles. Extensive closed-loop validation in real-world vehicle experiments demonstrates the system's ability to follow vehicles in various scenarios, outperforming traditional multi-stage algorithms. This makes it a promising solution for cost-effective autonomous vehicle platooning. A complete real-world vehicle experiment is available at https://youtu.be/zL1bcVb9kqQ.


Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms

Chen, Boyi, Wang, Zhangyu, Deuser, Fabian, Zollner, Johann Maximilian, Werner, Martin

arXiv.org Artificial Intelligence

Accurate and robust image-based geo-localization at a global scale is challenging due to diverse environments, visually ambiguous scenes, and the lack of distinctive landmarks in many regions. While contrastive learning methods show promising performance by aligning features between street-view images and corresponding locations, they neglect the underlying spatial dependency in the geographic space. As a result, they fail to address the issue of false negatives -- image pairs that are both visually and geographically similar but labeled as negatives, and struggle to effectively distinguish hard negatives, which are visually similar but geographically distant. To address this issue, we propose a novel spatially regularized contrastive learning strategy that integrates a semivariogram, which is a geostatistical tool for modeling how spatial correlation changes with distance. We fit the semivariogram by relating the distance of images in feature space to their geographical distance, capturing the expected visual content in a spatial correlation. With the fitted semivariogram, we define the expected visual dissimilarity at a given spatial distance as reference to identify hard negatives and false negatives. We integrate this strategy into GeoCLIP and evaluate it on the OSV5M dataset, demonstrating that explicitly modeling spatial priors improves image-based geo-localization performance, particularly at finer granularity.


Explaining Vision GNNs: A Semantic and Visual Analysis of Graph-based Image Classification

Chaidos, Nikolaos, Dimitriou, Angeliki, Spanos, Nikolaos, Voulodimos, Athanasios, Stamou, Giorgos

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have emerged as an efficient alternative to convolutional approaches for vision tasks such as image classification, leveraging patch-based representations instead of raw pixels. These methods construct graphs where image patches serve as nodes, and edges are established based on patch similarity or classification relevance. Despite their efficiency, the explainability of GNN-based vision models remains underexplored, even though graphs are naturally interpretable. In this work, we analyze the semantic consistency of the graphs formed at different layers of GNN-based image classifiers, focusing on how well they preserve object structures and meaningful relationships. A comprehensive analysis is presented by quantifying the extent to which inter-layer graph connections reflect semantic similarity and spatial coherence. Explanations from standard and adversarial settings are also compared to assess whether they reflect the classifiers' robustness. Additionally, we visualize the flow of information across layers through heatmap-based visualization techniques, thereby highlighting the models' explainability. Our findings demonstrate that the decision-making processes of these models can be effectively explained, while also revealing that their reasoning does not necessarily align with human perception, especially in deeper layers.


Spatial distance dependent Chinese restaurant processes for image segmentation

Neural Information Processing Systems

The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The dd-CRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the dd-CRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing "human-like" segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.


LTS-NET: End-to-end Unsupervised Learning of Long-Term 3D Stable objects

Hroob, Ibrahim, Molina, Sergi, Polvara, Riccardo, Cielniak, Grzegorz, Hanheide, Marc

arXiv.org Artificial Intelligence

In this research, we present an end-to-end data-driven pipeline for determining the long-term stability status of objects within a given environment, specifically distinguishing between static and dynamic objects. Understanding object stability is key for mobile robots since long-term stable objects can be exploited as landmarks for long-term localisation. Our pipeline includes a labelling method that utilizes historical data from the environment to generate training data for a neural network. Rather than utilizing discrete labels, we propose the use of point-wise continuous label values, indicating the spatio-temporal stability of individual points, to train a point cloud regression network named LTS-NET. Our approach is evaluated on point cloud data from two parking lots in the NCLT dataset, and the results show that our proposed solution, outperforms direct training of a classification model for static vs dynamic object classification.


Spatial distance dependent Chinese restaurant processes for image segmentation

Neural Information Processing Systems

The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing "human-like" segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain.


Generative Models and Learning Algorithms for Core-Periphery Structured Graphs

Gurugubelli, Sravanthi, Chepuri, Sundeep Prabhakar

arXiv.org Artificial Intelligence

We consider core-periphery structured graphs, which are graphs with a group of densely and sparsely connected nodes, respectively, referred to as core and periphery nodes. The so-called core score of a node is related to the likelihood of it being a core node. In this paper, we focus on learning the core scores of a graph from its node attributes and connectivity structure. To this end, we propose two classes of probabilistic graphical models: affine and nonlinear. First, we describe affine generative models to model the dependence of node attributes on its core scores, which determine the graph structure. Next, we discuss nonlinear generative models in which the partial correlations of node attributes influence the graph structure through latent core scores. We develop algorithms for inferring the model parameters and core scores of a graph when both the graph structure and node attributes are available. When only the node attributes of graphs are available, we jointly learn a core-periphery structured graph and its core scores. We provide results from numerical experiments on several synthetic and real-world datasets to demonstrate the efficacy of the developed models and algorithms.


STAN: Spatio-Temporal Attention Network for Next Location Recommendation

Luo, Yingtao, Liu, Qiang, Liu, Zhaocheng

arXiv.org Artificial Intelligence

The next location recommendation is at the core of various location-based applications. Current state-of-the-art models have attempted to solve spatial sparsity with hierarchical gridding and model temporal relation with explicit time intervals, while some vital questions remain unsolved. Non-adjacent locations and non-consecutive visits provide non-trivial correlations for understanding a user's behavior but were rarely considered. To aggregate all relevant visits from user trajectory and recall the most plausible candidates from weighted representations, here we propose a Spatio-Temporal Attention Network (STAN) for location recommendation. STAN explicitly exploits relative spatiotemporal information of all the check-ins with self-attention layers along the trajectory. This improvement allows a point-to-point interaction between non-adjacent locations and non-consecutive check-ins with explicit spatiotemporal effect. STAN uses a bi-layer attention architecture that firstly aggregates spatiotemporal correlation within user trajectory and then recalls the target with consideration of personalized item frequency (PIF). By visualization, we show that STAN is in line with the above intuition. Experimental results unequivocally show that our model outperforms the existing state-of-the-art methods by 9-17%.


Spatial distance dependent Chinese restaurant processes for image segmentation

Ghosh, Soumya, Ungureanu, Andrei B., Sudderth, Erik B., Blei, David M.

Neural Information Processing Systems

The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing "human-like" segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain.


The Length of Bridge Ties: Structural and Geographic Properties of Online Social Interactions

Volkovich, Yana (Barcelona Media Foundation) | Scellato, Salvatore (University of Cambridge) | Laniado, David (Barcelona Media Foundation) | Mascolo, Cecilia (University of Cambridge) | Kaltenbrunner, Andreas (Barcelona Media Foundation)

AAAI Conferences

The popularity of the Web has allowed individuals to communicate and interact with each other on a global scale: people connect both to close friends and acquaintances, creating ties that can bridge otherwise separated groups of people. Recent evidence suggests that spatial distance is still affecting social links established on online platforms, with online ties preferentially connecting closer people. In this work we study the relationships between interaction strength, spatial distance and structural position of ties between members of a large-scale online social networking platform, Tuenti. We discover that ties in highly connected social groups tend to span shorter distances than connections bridging together otherwise separated portions of the network. We also find that such bridging connections have lower social interaction levels than ties within the inner core of the network and ties connecting to its periphery. Our results suggest that spatial constraints on online social networks are intimately connected to structural network properties, with important consequences for information diffusion.