AITopics | Spatial Reasoning

Collaborating Authors

Spatial Reasoning

News Overviews Instructional Materials AI-Alerts Classics

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

Yang, Yuncong, Yang, Han, Zhou, Jiachen, Chen, Peihao, Zhang, Hongxin, Du, Yilun, Gan, Chuang

arXiv.org Artificial IntelligenceDec-15-2024

Constructing compact and informative 3D scene representations is essential for effective embodied exploration and reasoning, especially in complex environments over extended periods. Existing representations, such as object-centric 3D scene graphs, oversimplify spatial relationships by modeling scenes as isolated objects with restrictive textual relationships, making it difficult to address queries requiring nuanced spatial understanding. Moreover, these representations lack natural mechanisms for active exploration and memory management, hindering their application to lifelong autonomy. In this work, we propose 3D-Mem, a novel 3D scene memory framework for embodied agents. 3D-Mem employs informative multi-view images, termed Memory Snapshots, to represent the scene and capture rich visual information of explored regions. It further integrates frontier-based exploration by introducing Frontier Snapshots-glimpses of unexplored areas-enabling agents to make informed decisions by considering both known and potential new information. To support lifelong memory in active exploration settings, we present an incremental construction pipeline for 3D-Mem, as well as a memory retrieval technique for memory management. Experimental results on three benchmarks demonstrate that 3D-Mem significantly enhances agents' exploration and reasoning capabilities in 3D environments, highlighting its potential for advancing applications in embodied AI.

memory snapshot, representation, snapshot, (13 more...)

arXiv.org Artificial Intelligence

2411.17735

Country:

Oceania > Australia > Western Australia > Perth (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(2 more...)

Add feedback

EEG-GMACN: Interpretable EEG Graph Mutual Attention Convolutional Network

Ye, Haili, Goerttler, Stephan, He, Fei

arXiv.org Artificial IntelligenceDec-15-2024

Electroencephalogram (EEG) is a valuable technique to record brain electrical activity through electrodes placed on the scalp. Analyzing EEG signals contributes to the understanding of neurological conditions and developing brain-computer interface. Graph Signal Processing (GSP) has emerged as a promising method for EEG spatial-temporal analysis, by further considering the topological relationships between electrodes. However, existing GSP studies lack interpretability of electrode importance and the credibility of prediction confidence. This work proposes an EEG Graph Mutual Attention Convolutional Network (EEG-GMACN), by introducing an 'Inverse Graph Weight Module' to output interpretable electrode graph weights, enhancing the clinical credibility and interpretability of EEG classification results. Additionally, we incorporate a mutual attention mechanism module into the model to improve its capability to distinguish critical electrodes and introduce credibility calibration to assess the uncertainty of prediction results. This study enhances the transparency and effectiveness of EEG analysis, paving the way for its widespread use in clinical and neuroscience research.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.17834

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.30)

Add feedback

Unveiling Topological Structures in Text: A Comprehensive Survey of Topological Data Analysis Applications in NLP

Uchendu, Adaku, Le, Thai

arXiv.org Artificial IntelligenceDec-14-2024

The surge of data available on the internet has led to the adoption of various computational methods to analyze and extract valuable insights from this wealth of information. Among these, the field of Machine Learning (ML) has thrived by leveraging data to extract meaningful insights. However, ML techniques face notable challenges when dealing with real-world data, often due to issues of imbalance, noise, insufficient labeling, and high dimensionality. To address these limitations, some researchers advocate for the adoption of Topological Data Analysis (TDA), a statistical approach that discerningly captures the intrinsic shape of data despite noise. Despite its potential, TDA has not gained as much traction within the Natural Language Processing (NLP) domain compared to structurally distinct areas like computer vision. Nevertheless, a dedicated community of researchers has been exploring the application of TDA in NLP, yielding 87 papers we comprehensively survey in this paper. Our findings categorize these efforts into theoretical and non-theoretical approaches. Theoretical approaches aim to explain linguistic phenomena from a topological viewpoint, while non-theoretical approaches merge TDA with ML features, utilizing diverse numerical representation techniques. We conclude by exploring the challenges and unresolved questions that persist in this niche field. Resources and a list of papers on this topic can be found at: https://github.com/AdaUchendu/AwesomeTDA4NLP.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.10298

Country:

Oceania > Australia (0.04)
North America > United States > Texas (0.04)
North America > United States > Missouri > Greene County > Springfield (0.04)
(13 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Government (0.93)
Information Technology > Security & Privacy (0.69)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
(2 more...)

Add feedback

GeoConformal prediction: a model-agnostic framework of measuring the uncertainty of spatial prediction

Lou, Xiayin, Luo, Peng, Meng, Liqiu

arXiv.org Machine LearningDec-13-2024

Spatial prediction is a fundamental task in geography. In recent years, with advances in geospatial artificial intelligence (GeoAI), numerous models have been developed to improve the accuracy of geographic variable predictions. Beyond achieving higher accuracy, it is equally important to obtain predictions with uncertainty measures to enhance model credibility and support responsible spatial prediction. Although geostatistic methods like Kriging offer some level of uncertainty assessment, such as Kriging variance, these measurements are not always accurate and lack general applicability to other spatial models. To address this issue, we propose a model-agnostic uncertainty assessment method called GeoConformal Prediction, which incorporates geographical weighting into conformal prediction. We applied it to two classic spatial prediction cases, spatial regression and spatial interpolation, to evaluate its reliability. First, in the spatial regression case, we used XGBoost to predict housing prices, followed by GeoConformal to calculate uncertainty. Our results show that GeoConformal achieved a coverage rate of 93.67%, while Bootstrap methods only reached a maximum coverage of 81.00% after 2000 runs. Next, we applied GeoConformal to spatial interpolation models. We found that the uncertainty obtained from GeoConformal aligned closely with the variance in Kriging. Finally, using GeoConformal, we analyzed the sources of uncertainty in spatial prediction. We found that explicitly including local features in AI models can significantly reduce prediction uncertainty, especially in areas with strong local dependence. Our findings suggest that GeoConformal holds potential not only for geographic knowledge discovery but also for guiding the design of future GeoAI models, paving the way for more reliable and interpretable spatial prediction frameworks.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Machine Learning

2412.08661

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Oil & Gas > Upstream (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A dual contrastive framework

Sun, Yuan, Zhang, Zhao, Ortiz, Jorge

arXiv.org Artificial IntelligenceDec-13-2024

In current multimodal tasks, models typically freeze the encoder and decoder while adapting intermediate layers to task-specific goals, such as region captioning. Region-level visual understanding presents significant challenges for large-scale vision-language models. While limited spatial awareness is a known issue, coarse-grained pretraining, in particular, exacerbates the difficulty of optimizing latent representations for effective encoder-decoder alignment. We propose AlignCap, a framework designed to enhance region-level understanding through fine-grained alignment of latent spaces. Our approach introduces a novel latent feature refinement module that enhances conditioned latent space representations to improve region-level captioning performance. We also propose an innovative alignment strategy, the semantic space alignment module, which boosts the quality of multimodal representations. Additionally, we incorporate contrastive learning in a novel manner within both modules to further enhance region-level captioning performance. To address spatial limitations, we employ a General Object Detection (GOD) method as a data preprocessing pipeline that enhances spatial reasoning at the regional level. Extensive experiments demonstrate that our approach significantly improves region-level captioning performance across various tasks

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.10348

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification

Rao, Haocong, Miao, Chunyan

arXiv.org Artificial IntelligenceDec-12-2024

Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios. Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning. However, they rarely explore key body structure and motion such as gait to focus on more important body joints or limbs, while lacking the ability to fully mine valuable spatial-temporal sub-patterns of skeletons to enhance model learning. This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos) that exploits structure-specific and gait-related body relations as well as combinatorial features of skeleton graphs to learn effective skeleton representations for person re-ID. In particular, motivated by the locality within joints' structure and the body-component collaboration in gait, we first propose the motif guided graph transformer (MGT) that incorporates hierarchical structural motifs and gait collaborative motifs, which simultaneously focuses on multi-order local joint correlations and key cooperative body parts to enhance skeleton relation learning. Then, we devise the combinatorial skeleton prototype learning (CSP) that leverages random spatial-temporal combinations of joint nodes and skeleton graphs to generate diverse sub-skeleton and sub-tracklet representations, which are contrasted with the most representative features (prototypes) of each identity to learn class-related semantics and discriminative skeleton representations. Extensive experiments validate the superior performance of MoCos over existing state-of-the-art models. We further show its generality under RGB-estimated skeletons, different graph modeling, and unsupervised scenarios.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.09044

Country:

Asia > Singapore (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

VLMs meet UDA: Boosting Transferability of Open Vocabulary Segmentation with Unsupervised Domain Adaptation

Alcover-Couso, Roberto, Escudero-Viñolo, Marcos, SanMiguel, Juan C., Bescos, Jesus

arXiv.org Artificial IntelligenceDec-12-2024

Segmentation models are typically constrained by the categories defined during training. To address this, researchers have explored two independent approaches: adapting Vision-Language Models (VLMs) and leveraging synthetic data. However, VLMs often struggle with granularity, failing to disentangle fine-grained concepts, while synthetic data-based methods remain limited by the scope of available datasets. This paper proposes enhancing segmentation accuracy across diverse domains by integrating Vision-Language reasoning with key strategies for Unsupervised Domain Adaptation (UDA). First, we improve the fine-grained segmentation capabilities of VLMs through multi-scale contextual data, robust text embeddings with prompt augmentation, and layer-wise fine-tuning in our proposed Foundational-Retaining Open Vocabulary Semantic Segmentation (FROVSS) framework. Next, we incorporate these enhancements into a UDA framework by employing distillation to stabilize training and cross-domain mixed sampling to boost adaptability without compromising generalization. The resulting UDA-FROVSS framework is the first UDA approach to effectively adapt across domains without requiring shared categories.

category, dataset, segmentation, (13 more...)

arXiv.org Artificial Intelligence

2412.0924

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > Germany (0.04)
Asia > Japan (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Improving CS Performance by Developing Spatial Skills

Communications of the ACMDec-11-2024, 17:59:54 GMT

In the mid-2000s, the concept of brain training gained popularity, especially among high-achieving individuals. Brain training is often gamified, such as in Sudoku or Wordle, to encourage people to practice domain-independent cognitive skills. Advocates claimed that by practicing fundamental reasoning, memory, and problem-solving skills, a person can globally improve cognitive function.33 The scientific evidence, however, suggests that these claims are true only locally, meaning that performance improves on only the practiced tasks.28 For example, if people practice memorizing long lists of numbers, then they get better at memorizing long lists of numbers but at no other memory tasks.39

cs performance, spatial skill, stem performance and achievement, (8 more...)

Communications of the ACM

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.41)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.37)

Add feedback

Modeling High-Resolution Spatio-Temporal Wind with Deep Echo State Networks and Stochastic Partial Differential Equations

Wang, Kesen, Kim, Minwoo, Castruccio, Stefano, Genton, Marc G.

arXiv.org Machine LearningDec-10-2024

In the past decades, clean and renewable energy has gained increasing attention due to a global effort on carbon footprint reduction. In particular, Saudi Arabia is gradually shifting its energy portfolio from an exclusive use of oil to a reliance on renewable energy, and, in particular, wind. Modeling wind for assessing potential energy output in a country as large, geographically diverse and understudied as Saudi Arabia is a challenge which implies highly non-linear dynamic structures in both space and time. To address this, we propose a spatio-temporal model whose spatial information is first reduced via an energy distance-based approach and then its dynamical behavior is informed by a sparse and stochastic recurrent neural network (Echo State Network). Finally, the full spatial data is reconstructed by means of a non-stationary stochastic partial differential equation-based approach. Our model can capture the fine scale wind structure and produce more accurate forecasts of both wind speed and energy in lead times of interest for energy grid management and save annually as much as one million dollar against the closest competitive model.

artificial intelligence, machine learning, spatial reasoning, (20 more...)

arXiv.org Machine Learning

2412.07265

Country:

North America > United States (1.00)
Europe > Germany (0.14)
Asia > South Korea (0.14)
Asia > Middle East > Saudi Arabia > Eastern Province > Rub' al Khali (0.14)

Genre: Research Report (1.00)

Industry: Energy > Renewable > Wind (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model

Wang, Lening, Zheng, Wenzhao, Du, Dalong, Zhang, Yunpeng, Ren, Yilong, Jiang, Han, Cui, Zhiyong, Yu, Haiyang, Zhou, Jie, Lu, Jiwen, Zhang, Shanghang

arXiv.org Artificial IntelligenceDec-10-2024

4D driving simulation is essential for developing realistic autonomous driving simulators. Despite advancements in existing methods for generating driving scenes, significant challenges remain in view transformation and spatial-temporal dynamic modeling. To address these limitations, we propose a Spatial-Temporal simulAtion for drivinG (Stag-1) model to reconstruct real-world scenes and design a controllable generative network to achieve 4D simulation. Stag-1 constructs continuous 4D point cloud scenes using surround-view data from autonomous vehicles. It decouples spatial-temporal relationships and produces coherent keyframe videos. Additionally, Stag-1 leverages video generation models to obtain photo-realistic and controllable 4D driving simulation videos from any perspective. To expand the range of view generation, we train vehicle motion videos based on decomposed camera poses, enhancing modeling capabilities for distant scenes. Furthermore, we reconstruct vehicle camera trajectories to integrate 3D points across consecutive views, enabling comprehensive scene understanding along the temporal dimension. Following extensive multi-level scene training, Stag-1 can simulate from any desired viewpoint and achieve a deep understanding of scene evolution under static spatial-temporal conditions. Compared to existing methods, our approach shows promising performance in multi-view scene consistency, background coherence, and accuracy, and contributes to the ongoing advancements in realistic autonomous driving simulation. Code: https://github.com/wzzheng/Stag.

artificial intelligence, machine learning, simulation, (15 more...)

arXiv.org Artificial Intelligence

2412.0528

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.80)

Add feedback