AITopics | Spatial Reasoning

Collaborating Authors

Spatial Reasoning

News Overviews Instructional Materials AI-Alerts Classics

Urban Region Pre-training and Prompting: A Graph-based Approach

Jin, Jiahui, Song, Yifan, Kan, Dong, Zhu, Haojia, Sun, Xiangguo, Li, Zhicheng, Sun, Xigang, Zhang, Jinghui

arXiv.org Artificial IntelligenceAug-26-2024

Urban region representation is crucial for various urban downstream tasks. However, despite the proliferation of methods and their success, acquiring general urban region knowledge and adapting to different tasks remains challenging. Previous work often neglects the spatial structures and functional layouts between entities, limiting their ability to capture transferable knowledge across regions. Further, these methods struggle to adapt effectively to specific downstream tasks, as they do not adequately address the unique features and relationships required for different downstream tasks. In this paper, we propose a $\textbf{G}$raph-based $\textbf{U}$rban $\textbf{R}$egion $\textbf{P}$re-training and $\textbf{P}$rompting framework ($\textbf{GURPP}$) for region representation learning. Specifically, we first construct an urban region graph that integrates detailed spatial entity data for more effective urban region representation. Then, we develop a subgraph-centric urban region pre-training model to capture the heterogeneous and transferable patterns of interactions among entities. To further enhance the adaptability of these embeddings to different tasks, we design two graph-based prompting methods to incorporate explicit/hidden task knowledge. Extensive experiments on various urban region prediction tasks and different cities demonstrate the superior performance of our GURPP framework.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2408.0592

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)

Add feedback

Pixel-Aligned Multi-View Generation with Depth Guided Decoder

Tang, Zhenggang, Zhuang, Peiye, Wang, Chaoyang, Siarohin, Aliaksandr, Kant, Yash, Schwing, Alexander, Tulyakov, Sergey, Lee, Hsin-Ying

arXiv.org Artificial IntelligenceAug-26-2024

The task of image-to-multi-view generation refers to generating novel views of an instance from a single image. Recent methods achieve this by extending text-to-image latent diffusion models to multi-view version, which contains an VAE image encoder and a U-Net diffusion model. Specifically, these generation methods usually fix VAE and finetune the U-Net only. However, the significant downscaling of the latent vectors computed from the input images and independent decoding leads to notable pixel-level misalignment across multiple views. To address this, we propose a novel method for pixel-level image-to-multi-view generation. Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model. Specifically, we introduce a depth-truncated epipolar attention, enabling the model to focus on spatially adjacent regions while remaining memory efficient. Applying depth-truncated attn is challenging during inference as the ground-truth depth is usually difficult to obtain and pre-trained depth estimation models is hard to provide accurate depth. Thus, to enhance the generalization to inaccurate depth when ground truth depth is missing, we perturb depth inputs during training. During inference, we employ a rapid multi-view to 3D reconstruction approach, NeuS, to obtain coarse depth for the depth-truncated epipolar attention. Our model enables better pixel alignment across multi-view images. Moreover, we demonstrate the efficacy of our approach in improving downstream multi-view to 3D reconstruction tasks.

decoder, diffusion model, epipolar attention, (13 more...)

arXiv.org Artificial Intelligence

2408.14016

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)

Add feedback

A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

Khanam, Tahmina, Laga, Hamid, Bennamoun, Mohammed, Wang, Guanjin, Sohel, Ferdous, Boussaid, Farid, Wang, Guan, Srivastava, Anuj

arXiv.org Artificial IntelligenceAug-22-2024

We propose the first comprehensive approach for modeling and analyzing the spatiotemporal shape variability in tree-like 4D objects, i.e., 3D objects whose shapes bend, stretch, and change in their branching structure over time as they deform, grow, and interact with their environment. Our key contribution is the representation of tree-like 3D shapes using Square Root Velocity Function Trees (SRVFT). By solving the spatial registration in the SRVFT space, which is equipped with an L2 metric, 4D tree-shaped structures become time-parameterized trajectories in this space. This reduces the problem of modeling and analyzing 4D tree-like shapes to that of modeling and analyzing elastic trajectories in the SRVFT space, where elasticity refers to time warping. In this paper, we propose a novel mathematical representation of the shape space of such trajectories, a Riemannian metric on that space, and computational tools for fast and accurate spatiotemporal registration and geodesics computation between 4D tree-shaped structures. Leveraging these building blocks, we develop a full framework for modelling the spatiotemporal variability using statistical models and generating novel 4D tree-like structures from a set of exemplars. We demonstrate and validate the proposed framework using real 4D plant data.

machine learning, registration, temporal reasoning, (17 more...)

arXiv.org Artificial Intelligence

2408.12443

Country:

Asia > China (0.04)
North America > United States (0.04)
Oceania > New Zealand (0.04)
Oceania > Australia > Western Australia (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.41)

Add feedback

Self-supervised Learning for Geospatial AI: A Survey

Chen, Yile, Huang, Weiming, Zhao, Kaiqi, Jiang, Yue, Cong, Gao

arXiv.org Artificial IntelligenceAug-22-2024

The proliferation of geospatial data in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across various urban applications. Given the vast yet inherently sparse labeled nature of geospatial data, there is a critical need for techniques that can effectively leverage such data without heavy reliance on labeled datasets. This requirement aligns with the principles of self-supervised learning (SSL), which has attracted increasing attention for its adoption in geospatial data. This paper conducts a comprehensive and up-to-date survey of SSL techniques applied to or developed for three primary data (geometric) types prevalent in geospatial vector data: points, polylines, and polygons. We systematically categorize various SSL techniques into predictive and contrastive methods, discussing their application with respect to each data type in enhancing generalization across various downstream tasks. Furthermore, we review the emerging trends of SSL for GeoAI, and several task-specific SSL techniques. Finally, we discuss several key challenges in the current research and outline promising directions for future investigation. By presenting a structured analysis of relevant studies, this paper aims to inspire continued advancements in the integration of SSL with GeoAI, encouraging innovative methods to harnessing the power of geospatial data.

learning, representation, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2408.12133

Country:

Asia > Singapore (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > Sweden (0.04)
Africa (0.04)

Genre: Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Consumer Products & Services (0.93)
Transportation > Ground > Road (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)

Add feedback

ST-USleepNet: A Spatial-Temporal Coupling Prominence Network for Multi-Channel Sleep Staging

Ma, Jingying, Lin, Qika, Jia, Ziyu, Feng, Mengling

arXiv.org Artificial IntelligenceAug-21-2024

Sleep staging is critical for assessing sleep quality and diagnosing disorders. Recent advancements in artificial intelligence have driven the development of automated sleep staging models, which still face two significant challenges. 1) Simultaneously extracting prominent temporal and spatial sleep features from multi-channel raw signals, including characteristic sleep waveforms and salient spatial brain networks. 2) Capturing the spatial-temporal coupling patterns essential for accurate sleep staging. To address these challenges, we propose a novel framework named ST-USleepNet, comprising a spatial-temporal graph construction module (ST) and a U-shaped sleep network (USleepNet). The ST module converts raw signals into a spatial-temporal graph to model spatial-temporal couplings. The USleepNet utilizes a U-shaped structure originally designed for image segmentation. Similar to how image segmentation isolates significant targets, when applied to both raw sleep signals and ST module-generated graph data, USleepNet segments these inputs to extract prominent temporal and spatial sleep features simultaneously. Testing on three datasets demonstrates that ST-USleepNet outperforms existing baselines, and model visualizations confirm its efficacy in extracting prominent sleep features and temporal-spatial coupling patterns across various sleep stages. The code is available at: https://github.com/Majy-Yuji/ST-USleepNet.git.

brain network, characteristic sleep waveform, hemisphere, (12 more...)

arXiv.org Artificial Intelligence

2408.11884

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Sleep (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition

Huang, Wenbo, Zhang, Jinghui, Qian, Xuwei, Wu, Zhen, Wang, Meng, Zhang, Lei

arXiv.org Artificial IntelligenceAug-21-2024

High frame-rate (HFR) videos of action recognition improve fine-grained expression while reducing the spatio-temporal relation and motion information density. Thus, large amounts of video samples are continuously required for traditional data-driven training. However, samples are not always sufficient in real-world scenarios, promoting few-shot action recognition (FSAR) research. We observe that most recent FSAR works build spatio-temporal relation of video samples via temporal alignment after spatial feature extraction, cutting apart spatial and temporal features within samples. They also capture motion information via narrow perspectives between adjacent frames without considering density, leading to insufficient motion information capturing. Therefore, we propose a novel plug-and-play architecture for FSAR called Spatio-tempOral frAme tuPle enhancer (SOAP) in this paper. The model we designed with such architecture refers to SOAP-Net. Temporal connections between different feature channels and spatio-temporal relation of features are considered instead of simple feature extraction. Comprehensive motion information is also captured, using frame tuples with multiple frames containing more motion information than adjacent frames. Combining frame tuples of diverse frame counts further provides a broader perspective. SOAP-Net achieves new state-of-the-art performance across well-known benchmarks such as SthSthV2, Kinetics, UCF101, and HMDB51. Extensive empirical evaluations underscore the competitiveness, pluggability, generalization, and robustness of SOAP. The code is released at https://github.com/wenbohuang1002/SOAP.

information, motion information, recognition, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3664647.3681062

2407.16344

Country:

Oceania > Australia > Victoria > Melbourne (0.05)
Asia > China > Jiangsu Province > Nanjing (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.48)

Add feedback

Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams

Hashemian, Sajjad, Arvenaghi, Mohammad Saeed, Ardeshir-Larijani, Ebrahim

arXiv.org Artificial IntelligenceAug-19-2024

In this paper, we introduce new algorithms for Principal Component Analysis (PCA) with outliers. Utilizing techniques from computational geometry, specifically higher-degree Voronoi diagrams, we navigate to the optimal subspace for PCA even in the presence of outliers. This approach achieves an optimal solution with a time complexity of $n^{d+\mathcal{O}(1)}\text{poly}(n,d)$. Additionally, we present a randomized algorithm with a complexity of $2^{\mathcal{O}(r(d-r))} \times \text{poly}(n, d)$. This algorithm samples subspaces characterized in terms of a Grassmannian manifold. By employing such sampling method, we ensure a high likelihood of capturing the optimal subspace, with the success probability $(1 - \delta)^T$. Where $\delta$ represents the probability that a sampled subspace does not contain the optimal solution, and $T$ is the number of subspaces sampled, proportional to $2^{r(d-r)}$. Our use of higher-degree Voronoi diagrams and Grassmannian based sampling offers a clearer conceptual pathway and practical advantages, particularly in handling large datasets or higher-dimensional settings.

algorithm, subspace, voronoi diagram, (13 more...)

arXiv.org Artificial Intelligence

2408.06867

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)

Add feedback

GeoTransformer: Enhancing Urban Forecasting with Geospatial Attention Mechanisms

Jia, Yuhao, Wu, Zile, Yi, Shengao, Sun, Yifei

arXiv.org Artificial IntelligenceAug-16-2024

Recent advancements have focused on encoding urban spatial information into high-dimensional spaces, with notable efforts dedicated to integrating sociodemographic data and satellite imagery. These efforts have established foundational models in this field. However, the effective utilization of these spatial representations for urban forecasting applications remains under-explored. To address this gap, we introduce GeoTransformer, a novel structure that synergizes the Transformer architecture with geospatial statistics prior. GeoTransformer employs an innovative geospatial attention mechanism to incorporate extensive urban information and spatial dependencies into a unified predictive model. Specifically, we compute geospatial weighted attention scores between the target region and surrounding regions and leverage the integrated urban information for predictions. Extensive experiments on GDP and ride-share demand prediction tasks demonstrate that GeoTransformer significantly outperforms existing baseline models, showcasing its potential to enhance urban forecasting tasks.

geotransformer, prediction, representation, (17 more...)

arXiv.org Artificial Intelligence

2408.08852

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.93)
Banking & Finance > Economy (0.69)
Transportation > Passenger (0.68)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Representation Learning of Geometric Trees

Zhang, Zheng, Zhang, Allen, Nelson, Ruth, Ascoli, Giorgio, Zhao, Liang

arXiv.org Artificial IntelligenceAug-16-2024

Geometric trees are characterized by their tree-structured layout and spatially constrained nodes and edges, which significantly impacts their topological attributes. This inherent hierarchical structure plays a crucial role in domains such as neuron morphology and river geomorphology, but traditional graph representation methods often overlook these specific characteristics of tree structures. To address this, we introduce a new representation learning framework tailored for geometric trees. It first features a unique message passing neural network, which is both provably geometrical structure-recoverable and rotation-translation invariant. To address the data label scarcity issue, our approach also includes two innovative training targets that reflect the hierarchical ordering and geometric structure of these geometric trees. This enables fully self-supervised learning without explicit labels. We validate our method's effectiveness on eight real-world datasets, demonstrating its capability to represent geometric trees.

dataset, node, representation, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3637528.3671688

2408.08799

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Virginia > Fairfax County > Reston (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)

Add feedback

System States Forecasting of Microservices with Dynamic Spatio-Temporal Data

Xu, Yifei, Ge, Jingguo, Tang, Haina, Ding, Shuai, Li, Tong, Li, Hui

arXiv.org Artificial IntelligenceAug-14-2024

In the AIOps (Artificial Intelligence for IT Operations) era, accurately forecasting system states is crucial. In microservices systems, this task encounters the challenge of dynamic and complex spatio-temporal relationships among microservice instances, primarily due to dynamic deployments, diverse call paths, and cascading effects among instances. Current time-series forecasting methods, which focus mainly on intrinsic patterns, are insufficient in environments where spatial relationships are critical. Similarly, spatio-temporal graph approaches often neglect the nature of temporal trend, concentrating mostly on message passing between nodes. Moreover, current research in microservices domain frequently underestimates the importance of network metrics and topological structures in capturing the evolving dynamics of systems. This paper introduces STMformer, a model tailored for forecasting system states in microservices environments, capable of handling multi-node and multivariate time series. Our method leverages dynamic network connection data and topological information to assist in modeling the intricate spatio-temporal relationships within the system. Additionally, we integrate the PatchCrossAttention module to compute the impact of cascading effects globally. We have developed a dataset based on a microservices system and conducted comprehensive experiments with STMformer against leading methods. In both short-term and long-term forecasting tasks, our model consistently achieved a 8.6% reduction in MAE(Mean Absolute Error) and a 2.2% reduction in MSE (Mean Squared Error). The source code is available at https://github.com/xuyifeiiie/STMformer.

information, node, time step, (12 more...)

arXiv.org Artificial Intelligence

2408.07894

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy (0.04)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Information Technology (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Cloud Computing (1.00)
(3 more...)

Add feedback