Goto

Collaborating Authors

 spatial pattern


Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective

Xue, Qiyao, Liu, Weichen, Wang, Shiqi, Wang, Haoming, Wu, Yuyang, Gao, Wei

arXiv.org Artificial Intelligence

Spatial reasoning is a core aspect of human intelligence that allows perception, inference and planning in 3D environments. However, current vision-language models (VLMs) struggle to maintain geometric coherence and cross-view consistency for spatial reasoning in multi-view settings. We attribute this gap to the lack of fine-grained benchmarks that isolate multi-view reasoning from single-view perception and temporal factors. To address this, we present ReMindView-Bench, a cognitively grounded benchmark for evaluating how VLMs construct, align and maintain spatial mental models across complementary viewpoints. ReMindView-Bench systematically varies viewpoint spatial pattern and query type to probe key factors of spatial cognition. Evaluations of 15 current VLMs reveals consistent failures in cross-view alignment and perspective-taking in multi-view spatial reasoning, motivating deeper analysis on the reasoning process. Explicit phase-wise analysis using LLM-as-a-judge and self-consistency prompting shows that VLMs perform well on in-frame perception but degrade sharply when integrating information across views. Implicit analysis, including linear probing and entropy dynamics, further show progressive loss of task-relevant information and uncertainty separation between correct and incorrect trajectories. These results provide a cognitively grounded diagnosis of VLM spatial reasoning and reveal how multi-view spatial mental models are formed, degraded and destabilized across reasoning phases. The ReMindView-Bench benchmark is available at https://huggingface.co/datasets/Xue0823/ReMindView-Bench, and the source codes of benchmark construction and VLM reasoning analysis are available at https://github.com/pittisl/ReMindView-Bench.



Motor Imagery EEG Signal Classification Using Minimally Random Convolutional Kernel Transform and Hybrid Deep Learning

Hwaidi, Jamal, Ghanem, Mohamed Chahine

arXiv.org Artificial Intelligence

The brain-computer interface (BCI) establishes a non-muscle channel that enables direct communication between the human body and an external device. Electroencephalography (EEG) is a popular non-invasive technique for recording brain signals. It is critical to process and comprehend the hidden patterns linked to a specific cognitive or motor task, for instance, measured through the motor imagery brain-computer interface (MI-BCI). A significant challenge is presented by classifying motor imagery-based electroencephalogram (MI-EEG) tasks, given that EEG signals exhibit nonstationarity, time-variance, and individual diversity. Obtaining good classification accuracy is also very difficult due to the growing number of classes and the natural variability among individuals. To overcome these issues, this paper proposes a novel method for classifying EEG motor imagery signals that extracts features efficiently with Minimally Random Convolutional Kernel Transform (MiniRocket), a linear classifier then uses the extracted features for activity recognition. Furthermore, a novel deep learning based on Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) architecture to serve as a baseline was proposed and demonstrated that classification via MiniRocket's features achieves higher performance than the best deep learning models at lower computational cost. The PhysioNet dataset was used to evaluate the performance of the proposed approaches. The proposed models achieved mean accuracy values of 98.63% and 98.06% for the MiniRocket and CNN-LSTM, respectively. The findings demonstrate that the proposed approach can significantly enhance motor imagery EEG accuracy and provide new insights into the feature extraction and classification of MI-EEG.


GRIT: Graph Transformer For Internal Ice Layer Thickness Prediction

Liu, Zesheng, Rahnemoonfar, Maryam

arXiv.org Artificial Intelligence

Gaining a deeper understanding of the thickness and variability of internal ice layers in Radar imagery is essential in monitoring the snow accumulation, better evaluating ice dynamics processes, and minimizing uncertainties in climate models. Radar sensors, capable of penetrating ice, capture detailed radargram images of internal ice layers. In this work, we introduce GRIT, graph transformer for ice layer thickness. GRIT integrates an inductive geometric graph learning framework with an attention mechanism, designed to map the relationships between shallow and deeper ice layers. Compared to baseline graph neural networks, GRIT demonstrates consistently lower prediction errors. These results highlight the attention mechanism's effectiveness in capturing temporal changes across ice layers, while the graph transformer combines the strengths of transformers for learning long-range dependencies with graph neural networks for capturing spatial patterns, enabling robust modeling of complex spatiotemporal dynamics.


Learning to generate physical ocean states: Towards hybrid climate modeling

Meunier, Etienne, Kamm, David, Gachon, Guillaume, Lguensat, Redouane, Deshayes, Julie

arXiv.org Artificial Intelligence

Ocean General Circulation Models require extensive computational resources to reach equilibrium states, while deep learning emulators, despite offering fast predictions, lack the physical interpretability and long-term stability necessary for climate scientists to understand climate sensitivity (to greenhouse gas emissions) and mechanisms of abrupt % variability such as tipping points. We propose to take the best from both worlds by leveraging deep generative models to produce physically consistent oceanic states that can serve as initial conditions for climate projections. We assess the viability of this hybrid approach through both physical metrics and numerical experiments, and highlight the benefits of enforcing physical constraints during generation. Although we train here on ocean variables from idealized numerical simulations, we claim that this hybrid approach, combining the computational efficiency of deep learning with the physical accuracy of numerical models, can effectively reduce the computational burden of running climate models to equilibrium, and reduce uncertainties in climate projections by minimizing drifts in baseline simulations.


Predicting Air Temperature from Volumetric Urban Morphology with Machine Learning

Kıvılcım, Berk, Bradley, Patrick Erik

arXiv.org Artificial Intelligence

In this study, we firstly introduce a method that converts CityGML data into voxels which works efficiently and fast in high resolution for large scale datasets such as cities but by sacrificing some building details to overcome the limitations of previous voxelization methodologies that have been computationally intensive and inefficient at transforming large-scale urban areas into voxel representations for high resolution. Those voxelized 3D city data from multiple cities and corresponding air temperature data are used to develop a machine learning model. Before the model training, Gaussian blurring is implemented on input data to consider spatial relationships, as a result the correlation rate between air temperature and volumetric building morphology is also increased after the Gaussian blurring. After the model training, the prediction results are not just evaluated with Mean Square Error (MSE) but some image similarity metrics such as Structural Similarity Index Measure (SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) that are able to detect and consider spatial relations during the evaluation process. This trained model is capable of predicting the spatial distribution of air temperature by using building volume information of corresponding pixel as input. By doing so, this research aims to assist urban planners in incorporating environmental parameters into their planning strategies, thereby facilitating more sustainable and inhabitable urban environments.


HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection

Gu, Zijian, Ma, Jianwei, Huang, Yan, Wei, Honghao, Chen, Zhanye, Zhang, Hui, Hong, Wei

arXiv.org Artificial Intelligence

Millimeter-wave radar plays a vital role in 3D object detection for autonomous driving due to its all-weather and all-lighting-condition capabilities for perception. However, radar point clouds suffer from pronounced sparsity and unavoidable angle estimation errors. To address these limitations, incorporating a camera may partially help mitigate the shortcomings. Nevertheless, the direct fusion of radar and camera data can lead to negative or even opposite effects due to the lack of depth information in images and low-quality image features under adverse lighting conditions. Hence, in this paper, we present the radar-camera fusion network with Hybrid Generation and Synchronization (HGSFusion), designed to better fuse radar potentials and image features for 3D object detection. Specifically, we propose the Radar Hybrid Generation Module (RHGM), which fully considers the Direction-Of-Arrival (DOA) estimation errors in radar signal processing. This module generates denser radar points through different Probability Density Functions (PDFs) with the assistance of semantic information. Meanwhile, we introduce the Dual Sync Module (DSM), comprising spatial sync and modality sync, to enhance image features with radar positional information and facilitate the fusion of distinct characteristics in different modalities. Extensive experiments demonstrate the effectiveness of our approach, outperforming the state-of-the-art methods in the VoD and TJ4DRadSet datasets by $6.53\%$ and $2.03\%$ in RoI AP and BEV AP, respectively. The code is available at https://github.com/garfield-cpp/HGSFusion.


GeoConformal prediction: a model-agnostic framework of measuring the uncertainty of spatial prediction

Lou, Xiayin, Luo, Peng, Meng, Liqiu

arXiv.org Machine Learning

Spatial prediction is a fundamental task in geography. In recent years, with advances in geospatial artificial intelligence (GeoAI), numerous models have been developed to improve the accuracy of geographic variable predictions. Beyond achieving higher accuracy, it is equally important to obtain predictions with uncertainty measures to enhance model credibility and support responsible spatial prediction. Although geostatistic methods like Kriging offer some level of uncertainty assessment, such as Kriging variance, these measurements are not always accurate and lack general applicability to other spatial models. To address this issue, we propose a model-agnostic uncertainty assessment method called GeoConformal Prediction, which incorporates geographical weighting into conformal prediction. We applied it to two classic spatial prediction cases, spatial regression and spatial interpolation, to evaluate its reliability. First, in the spatial regression case, we used XGBoost to predict housing prices, followed by GeoConformal to calculate uncertainty. Our results show that GeoConformal achieved a coverage rate of 93.67%, while Bootstrap methods only reached a maximum coverage of 81.00% after 2000 runs. Next, we applied GeoConformal to spatial interpolation models. We found that the uncertainty obtained from GeoConformal aligned closely with the variance in Kriging. Finally, using GeoConformal, we analyzed the sources of uncertainty in spatial prediction. We found that explicitly including local features in AI models can significantly reduce prediction uncertainty, especially in areas with strong local dependence. Our findings suggest that GeoConformal holds potential not only for geographic knowledge discovery but also for guiding the design of future GeoAI models, paving the way for more reliable and interpretable spatial prediction frameworks.


Are Deep Learning Methods Suitable for Downscaling Global Climate Projections? Review and Intercomparison of Existing Models

González-Abad, Jose, Gutiérrez, José Manuel

arXiv.org Machine Learning

Deep Learning (DL) has shown promise for downscaling global climate change projections under different approaches, including Perfect Prognosis (PP) and Regional Climate Model (RCM) emulation. Unlike emulators, PP downscaling models are trained on observational data, so it remains an open question whether they can plausibly extrapolate unseen conditions and changes in future emissions scenarios. Here we focus on this problem as the main drawback for the operationalization of these methods and present the results of 1) a literature review to identify state-of-the-art DL models for PP downscaling and 2) an intercomparison experiment to evaluate the performance of these models and to assess their extrapolation capability using a common experimental framework, taking into account the sensitivity of results to different training replicas. We focus on minimum and maximum temperatures and precipitation over Spain, a region with a range of climatic conditions with different influential regional processes. We conclude with a discussion of the findings, limitations of existing methods, and prospects for future development.


Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using UAV Imagery

Cui, Kangning, Tang, Wei, Zhu, Rongkun, Wang, Manqi, Larsen, Gregory D., Pauca, Victor P., Alqahtani, Sarra, Yang, Fan, Segurado, David, Fine, Paul, Karubian, Jordan, Chan, Raymond H., Plemmons, Robert J., Morel, Jean-Michel, Silman, Miles R.

arXiv.org Artificial Intelligence

Understanding the spatial distribution of palms within tropical forests is essential for effective ecological monitoring, conservation strategies, and the sustainable integration of natural forest products into local and global supply chains. However, the analysis of remotely sensed data in these environments faces significant challenges, such as overlapping palm and tree crowns, uneven shading across the canopy surface, and the heterogeneous nature of the forest landscapes, which often affect the performance of palm detection and segmentation algorithms. To overcome these issues, we introduce PalmDSNet, a deep learning framework for real-time detection, segmentation, and counting of canopy palms. Additionally, we employ a bimodal reproduction algorithm that simulates palm spatial propagation to further enhance the understanding of these point patterns using PalmDSNet's results. We used UAV-captured imagery to create orthomosaics from 21 sites across western Ecuadorian tropical forests, covering a gradient from the everwet Choc\'o forests near Colombia to the drier forests of southwestern Ecuador. Expert annotations were used to create a comprehensive dataset, including 7,356 bounding boxes on image patches and 7,603 palm centers across five orthomosaics, encompassing a total area of 449 hectares. By combining PalmDSNet with the bimodal reproduction algorithm, which optimizes parameters for both local and global spatial variability, we effectively simulate the spatial distribution of palms in diverse and dense tropical environments, validating its utility for advanced applications in tropical forest monitoring and remote sensing analysis.