AITopics | Spatial Reasoning

Collaborating Authors

Spatial Reasoning

News Overviews Instructional Materials AI-Alerts Classics

Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents

Wang, Hao, Li, Tang, Chu, Chenhui, Zhu, Nengjun, Wang, Rui, Zhu, Pinpin

arXiv.org Artificial IntelligenceMar-23-2024

Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related to visual and spatial features, resulting in suboptimal performance, particularly when dealing with limited examples. To address this limitation, our research focuses on few-shot relational learning, specifically targeting the extraction of key-value relation triplets in VRDs. Given the absence of a suitable dataset for this task, we introduce two new few-shot benchmarks built upon existing supervised benchmark datasets. Furthermore, we propose a variational approach that incorporates relational 2D-spatial priors and prototypical rectification techniques. This approach aims to generate relation representations that are more aware of the spatial context and unseen relation in a manner similar to human perception. Experimental results demonstrate the effectiveness of our proposed method by showcasing its ability to outperform existing methods. This study also opens up new possibilities for practical applications.

computational linguistic, dataset, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2403.15765

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
(8 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

Add feedback

Unifying Lane-Level Traffic Prediction from a Graph Structural Perspective: Benchmark and Baseline

Li, Shuhao, Cui, Yue, Xu, Jingyi, Li, Libin, Meng, Lingkai, Yang, Weidong, Zhang, Fan, Zhou, Xiaofang

arXiv.org Artificial IntelligenceMar-22-2024

Traffic prediction has long been a focal and pivotal area in research, witnessing both significant strides from city-level to road-level predictions in recent years. With the advancement of Vehicle-to-Everything (V2X) technologies, autonomous driving, and large-scale models in the traffic domain, lane-level traffic prediction has emerged as an indispensable direction. However, further progress in this field is hindered by the absence of comprehensive and unified evaluation standards, coupled with limited public availability of data and code. This paper extensively analyzes and categorizes existing research in lane-level traffic prediction, establishes a unified spatial topology structure and prediction tasks, and introduces a simple baseline model, GraphMLP, based on graph structure and MLP networks. We have replicated codes not publicly available in existing studies and, based on this, thoroughly and fairly assessed various models in terms of effectiveness, efficiency, and applicability, providing insights for practical applications. Additionally, we have released three new datasets and corresponding codes to accelerate progress in this field, all of which can be found on https://github.com/ShuhaoLii/TITS24LaneLevel-Traffic-Benchmark.

dataset, prediction, traffic prediction, (16 more...)

arXiv.org Artificial Intelligence

2403.14941

Country:

Asia > China > Guangdong Province > Guangzhou (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview (0.67)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Towards Effective Next POI Prediction: Spatial and Semantic Augmentation with Remote Sensing Data

Jiang, Nan, Yuan, Haitao, Si, Jianing, Chen, Minxiao, Wang, Shangguang

arXiv.org Artificial IntelligenceMar-22-2024

The next point-of-interest (POI) prediction is a significant task in location-based services, yet its complexity arises from the consolidation of spatial and semantic intent. This fusion is subject to the influences of historical preferences, prevailing location, and environmental factors, thereby posing significant challenges. In addition, the uneven POI distribution further complicates the next POI prediction procedure. To address these challenges, we enrich input features and propose an effective deep-learning method within a two-step prediction framework. Our method first incorporates remote sensing data, capturing pivotal environmental context to enhance input features regarding both location and semantics. Subsequently, we employ a region quad-tree structure to integrate urban remote sensing, road network, and POI distribution spaces, aiming to devise a more coherent graph representation method for urban spatial. Leveraging this method, we construct the QR-P graph for the user's historical trajectories to encapsulate historical travel knowledge, thereby augmenting input features with comprehensive spatial and semantic insights. We devise distinct embedding modules to encode these features and employ an attention mechanism to fuse diverse encodings. In the two-step prediction procedure, we initially identify potential spatial zones by predicting user-preferred tiles, followed by pinpointing specific POIs of a designated type within the projected tiles. Empirical findings from four real-world location-based social network datasets underscore the remarkable superiority of our proposed approach over competitive baseline methods.

pois, recommendation, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2404.04271

Country:

North America > United States > California (0.04)
North America > United States > New York (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.92)
Transportation (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)

Add feedback

Grounding Spatial Relations in Text-Only Language Models

Azkune, Gorka, Salaberria, Ander, Agirre, Eneko

arXiv.org Artificial IntelligenceMar-20-2024

This paper shows that text-only Language Models (LM) can learn to ground spatial relations like "left of" or "below" if they are provided with explicit location information of objects and they are properly trained to leverage those locations. We perform experiments on a verbalized version of the Visual Spatial Reasoning (VSR) dataset, where images are coupled with textual statements which contain real or fake spatial relations between two objects of the image. We verbalize the images using an off-the-shelf object detector, adding location tokens to every object label to represent their bounding boxes in textual form. Given the small size of VSR, we do not observe any improvement when using locations, but pretraining the LM over a synthetic dataset automatically derived by us improves results significantly when using location tokens. We thus show that locations allow LMs to ground spatial relations, with our text-only LMs outperforming Vision-and-Language Models and setting the new state-of-the-art for the VSR dataset. Our analysis show that our text-only LMs can generalize beyond the relations seen in the synthetic dataset to some extent, learning also more useful information than that encoded in the spatial rules we used to create the synthetic dataset itself.

location token, relation, spatial relation, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neunet.2023.11.031

2403.13666

Country:

Europe > Spain > Basque Country (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.67)

Add feedback

Linguistics from a topological viewpoint

Dong, Rui

arXiv.org Artificial IntelligenceMar-16-2024

Fortunately numbers are the dimensions of the k-th persistent homology there are many such suitable options, one option is the parameterized by the threshold r. p-Wasserstein distance with p > 0 being a parameter. Especially when p =, we call the -Wasserstein distance More than just counting topological structures with the bottleneck distance. We skip the exact definition of persistent Betti numbers, we can detect at which threshold p-Wasserstein distance here since it is too technical, the values a topological structure is born and dead.

circular structure, cloud, persistence diagram, (15 more...)

arXiv.org Artificial Intelligence

2403.1544

Country:

South America > Peru (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Colombia (0.04)
(9 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.69)

Add feedback

Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

Neural Information Processing SystemsMar-15-2024, 15:42:28 GMT

There has been a recent push in extraction of 3D spatial layout of scenes.

configuration, hypothesis, reasoning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

A2CI: A Cloud-based, Service-oriented Geospatial Cyberinfrastructure to Support Atmospheric Research

Li, Wenwen, Shao, Hu, Wang, Sizhe, Zhou, Xiran, Wu, Sheng

arXiv.org Artificial IntelligenceMar-15-2024

In recent years, atmospheric research has received increasing attention from environmental experts and the public because atmospheric phenomena such as El Nino, global warming, ozone depletion, and drought that may have negative effects on the Earth's climate and ecosystem are occurring more often (Walther et al. 2002; Karl and Trenberth 2003; Trenberth et al. 2014). In order to model the status quo and predict the trend of atmospheric phenomena and events, researchers need to retrieve data from various relevant domains, such as chemical components of aerosols and gases, the terrestrial surface, energy consumption, the hydrosphere, the biosphere, etc. (Schneider, 2006; Fowler et al., 2009; Guilyardi et al, 2009; Ramanathan et al., 2011; Katul et al., 2012). In complex earth system modeling, the data and services for atmospheric study present the characteristics of being distributed, collaborative and adaptive (Plale et al., 2006). The massive volume, rapid velocity and wide variety of data has led to a new era of atmospheric research that consists of accessing and integrating big data from distributed sources, conducting collaborative analysis in an interactive way, providing intelligent services for data management, and integration and visualization to foster discovery of hidden or new knowledge. One of the most important ways to support these activities is to establish a national or international spatial data infrastructure and geospatial cyberinfrastructure on which the data and computational resources can be easily shared, the spatial analysis tool can be executed on-the-fly and the scientific results can be effectively visualized (Yang et al., 2008; Li et al., 2011). Technically, a geospatial cyberinfrastructure (GCI) is an architecture that effectively utilizes geo-referenced data to connect people, information and computers based on the standardized data access protocols, high speed internet, high-performance computing facilities (HPC) and service-oriented data management (Yang et al., 2010). Since the concept's official introduction by the National Science Foundation (NSF) in its 2003 blue ribbon report, cyberinfrastructure research has attracted much attention from the atmospheric science domain because of its promise of bringing paradigm change for

dataset, information, visualization, (17 more...)

arXiv.org Artificial Intelligence

2403.14693

Country:

Pacific Ocean (0.05)
Atlantic Ocean (0.05)
Oceania > Australia (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (0.88)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Web (1.00)
(2 more...)

Add feedback

Persistent Homology for Learning Densities with Bounded Support

Neural Information Processing SystemsMar-14-2024, 04:59:45 GMT

We present a novel method for learning densities with bounded support which enables us to incorporate'hard' topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and - by incorporating persistent homology techniques in our approach - we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car.

constraint, information, trajectory, (16 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment > Sports > Motorsports (1.00)
Leisure & Entertainment > Sports > Horse Racing (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.54)

Add feedback

Uncertainty estimation in spatial interpolation of satellite precipitation with ensemble learning

Papacharalampous, Georgia, Tyralis, Hristos, Doulamis, Nikolaos, Doulamis, Anastasios

arXiv.org Artificial IntelligenceMar-14-2024

Predictions in the form of probability distributions are crucial for decision-making. Quantile regression enables this within spatial interpolation settings for merging remote sensing and gauge precipitation data. However, ensemble learning of quantile regression algorithms remains unexplored in this context. Here, we address this gap by introducing nine quantile-based ensemble learners and applying them to large precipitation datasets. We employed a novel feature engineering strategy, reducing predictors to distance-weighted satellite precipitation at relevant locations, combined with location elevation. Our ensemble learners include six stacking and three simple methods (mean, median, best combiner), combining six individual algorithms: quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). These algorithms serve as both base learners and combiners within different stacking methods. We evaluated performance against QR using quantile scoring functions in a large dataset comprising 15 years of monthly gauge-measured and satellite precipitation in contiguous US (CONUS). Stacking with QR and QRNN yielded the best results across quantile levels of interest (0.025, 0.050, 0.075, 0.100, 0.200, 0.300, 0.400, 0.500, 0.600, 0.700, 0.800, 0.900, 0.925, 0.950, 0.975), surpassing the reference method by 3.91% to 8.95%. This demonstrates the potential of stacking to improve probabilistic predictions in spatial interpolation and beyond.

algorithm, learner, prediction, (14 more...)

arXiv.org Artificial Intelligence

2403.10567

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
Europe > Greece (0.05)
(4 more...)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models

Huang, Haoxu, Lin, Fanqi, Hu, Yingdong, Wang, Shengjie, Gao, Yang

arXiv.org Artificial IntelligenceMar-13-2024

Foundation models pre-trained on web-scale data are shown to encapsulate extensive world knowledge beneficial for robotic manipulation in the form of task planning. However, the actual physical implementation of these plans often relies on task-specific learning methods, which require significant data collection and struggle with generalizability. In this work, we introduce Robotic Manipulation through Spatial Constraints of Parts (CoPa), a novel framework that leverages the common sense knowledge embedded within foundation models to generate a sequence of 6-DoF end-effector poses for open-world robotic manipulation. Specifically, we decompose the manipulation process into two phases: task-oriented grasping and task-aware motion planning. In the task-oriented grasping phase, we employ foundation vision-language models (VLMs) to select the object's grasping part through a novel coarse-to-fine grounding mechanism. During the task-aware motion planning phase, VLMs are utilized again to identify the spatial geometry constraints of task-relevant object parts, which are then used to derive post-grasp poses. We also demonstrate how CoPa can be seamlessly integrated with existing robotic planning algorithms to accomplish complex, long-horizon tasks. Our comprehensive real-world experiments show that CoPa possesses a fine-grained physical understanding of scenes, capable of handling open-set instructions and objects with minimal prompt engineering and without additional training. Project page: https://copa-2024.github.io/

arxiv preprint arxiv, constraint, preprint arxiv, (14 more...)

arXiv.org Artificial Intelligence

2403.08248

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.71)

Add feedback