AITopics | spatial reasoning

Collaborating Authors

spatial reasoning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Relational and Sequential Conformal Inference for Energy Time Series over Graphs via Foundation Models

Niresi, Keivan Faghih, Cicirello, Alice, Fink, Olga

arXiv.org Machine LearningJul-1-2026

Accurate energy demand forecasting is essential for the reliable operation and planning of modern sustainable energy systems. Spatial-temporal graph neural networks (STGNNs) have recently achieved strong performance in point forecasting by jointly modeling temporal dynamics and relational dependencies across interconnected energy nodes. However, in real-world energy systems, accurate point forecasts alone are insufficient, as operators also require reliable uncertainty estimates to support risk-aware decision-making, grid stability, and operational planning under uncertainty. Conformal prediction provides a principled and model-agnostic framework for uncertainty quantification with statistical coverage guarantees, making it particularly attractive for safety-critical energy applications. However, existing conformal prediction approaches often fail to fully capture the complex spatial-temporal structure of energy systems. To address these limitations, we propose STOIC (Spatial-Temporal Graph Conformal Prediction with In-Context Learning), a novel framework that integrates graph-based forecasting with the zero-shot calibration capabilities of tabular foundation models. STOIC first generates point forecasts using an STGNN and subsequently reformulates spatial-temporal residuals into a tabular representation suitable for in-context learning. Leveraging a tabular foundation model, STOIC calibrates prediction intervals without task-specific retraining, effectively capturing both sequential and relational dependencies. We evaluate STOIC on five diverse benchmarks, including synthetic simulations as well as real-world electricity and district heating networks. Across all datasets, STOIC consistently outperforms existing conformal prediction baselines, delivering more reliable and robust uncertainty estimates for complex graph-structured energy time series.

artificial intelligence, forecasting, machine learning, (19 more...)

arXiv.org Machine Learning

2606.31804

Country:

Europe > Switzerland (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Energy > Renewable (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Statistical and Structural Approaches to Algorithmic Fairness

Ferrara, Antonio

arXiv.org Machine LearningJun-26-2026

Modern machine learning systems have outgrown their origins as isolated predictive constructs, evolving into complex socio-technical architectures that actively mediate human opportunity. As algorithms increasingly determine access to economic and social opportunities, it has become widely recognized that these systems are deeply embedded with the structural inequalities and prejudices of their environments. The field of algorithmic fairness emerged in response to the growing recognition that models optimized for predictive accuracy can systematically disadvantage marginalized groups. Early mitigation strategies, however, rested on fragile simplifications that limited their effectiveness in complex sociotechnical environments. This thesis identifies and addresses two fundamental limitations of contemporary fairness paradigms: the reliance on deterministic point estimates for auditing and the treatment of individuals as isolated entities devoid of structural context. First, the diagnosis of algorithmic unfairness has traditionally depended on scalar metrics that fail to capture the nuances of real-world deployment. This deterministic approach ignores the high statistical variance inherent in small, intersectional groups, often leading to false alarms or missed detections of bias. Furthermore, standard auditing struggles with the opacity of black-box models, frequently conflating unjustifiable bias with the influence of legitimate features.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2606.262

Country:

North America > United States (1.00)
Europe > United Kingdom > England (1.00)
Europe > Germany (1.00)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Leisure & Entertainment (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(7 more...)

Add feedback

Stitch and Tell Data Augmentation Method for Spatial Understanding

Neural Information Processing SystemsJun-23-2026, 06:23:01 GMT

Existing vision-language models often suffer from spatial hallucinations, i.e., generating incorrect descriptions about the relative positions of objects in an image. We argue that this problem mainly stems from the asymmetric properties between images and text. To enrich the spatial understanding ability of vision-language models, we propose a simple, annotation-free, plug-and-play method named Stitch and Tell (abbreviated as SiTe), which injects structured spatial supervision into multimodal data. It constructs stitched image-text pairs by stitching images along a spatial axis and generating spatially-aware captions or question answer pairs based on the layout of stitched image, without relying on costly advanced models or human involvement. We evaluate SiTe across three architectures including LLaVA-v1.5-7B,

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.67)
Asia (0.67)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.95)
(2 more...)

Add feedback

GeoLink: Empowering Remote Sensing Foundation Model with OpenStreetMap Data

Neural Information Processing SystemsJun-23-2026, 03:12:46 GMT

Integrating ground-level geospatial data with rich geographic context, like OpenStreetMap (OSM), into remote sensing (RS) foundation models (FMs) is essential for advancing geospatial intelligence and supporting a broad spectrum of tasks. However, modality gap between RS and OSM data, including differences in data structure, content, and spatial granularity, makes effective synergy highly challenging, and most existing RSFMs focus on imagery alone. To this end, this study presents GeoLink, a multimodal framework that leverages OSM data to enhance RSFM during both the pretraining and downstream task stages. Specifically, GeoLink enhances RS self-supervised pretraining using multi-granularity learning signals derived from OSM data, guided by cross-modal spatial correlations for information interaction and collaboration. It also introduces image maskreconstruction to enable sparse input for efficient pretraining. For downstream tasks, GeoLink generates both unimodal and multimodal fine-grained encodings to support a wide range of applications, from common RS interpretation tasks like land cover classification to more comprehensive geographic tasks like urban function zone mapping. Extensive experiments show that incorporating OSM data during pretraining enhances the performance of the RS image encoder, while fusing RS and OSM data in downstream tasks improves the FM's adaptability to complex geographic scenarios. These results underscore the potential of multimodal synergy in advancing high-level geospatial artificial intelligence. Moreover, we find that spatial correlation plays a crucial role in enabling effective multimodal geospatial data integration.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.68)
Asia > China (0.47)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.73)
Information Technology > Services (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Geographic Information Systems (1.00)
Information Technology > Data Science (1.00)
(4 more...)

Add feedback

Learning Spatial-Aware Manipulation Ordering

Neural Information Processing SystemsJun-23-2026, 02:16:31 GMT

Manipulation in cluttered environments is challenging due to spatial dependencies among objects, where an improper manipulation order can cause collisions or blocked access. Existing approaches often overlook these spatial relationships, limiting their flexibility and scalability. To address these limitations, we propose OrderMind, a unified spatial-aware manipulation ordering framework that directly learns object manipulation priorities based on spatial context.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
(3 more...)

Add feedback

Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs

Neural Information Processing SystemsJun-23-2026, 01:57:43 GMT

Despite progress in Large Vision-Language Models (LVLMs), their capacity for visual reasoning is often limited by the binding problem: the failure to reliably associate perceptual features with their correct visual referents.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

L2RSI: Cross-view LiDAR-based Place Recognition for Large-scale Urban Scenes via Remote Sensing Imagery

Neural Information Processing SystemsJun-23-2026, 01:31:25 GMT

We tackle the challenge of LiDAR-based place recognition, which traditionally depends on costly and time-consuming prior 3D maps. To overcome this, we first construct LiRSI-XA dataset, which encompasses approximately 110,000 remote sensing submaps and 13,000 LiDAR point cloud submaps captured in urban scenes, and propose a novel method, L2RSI, for cross-view LiDAR place recognition using high-resolution Remote Sensing Imagery. This approach enables large-scale localization capabilities at a reduced cost by leveraging readily available overhead images as map proxies. L2RSI addresses the dual challenges of cross-view and cross-modal place recognition by learning feature alignment between point cloud submaps and remote sensing submaps in the semantic domain. Additionally, we introduce a novel probability propagation method based on particle estimation to refine position predictions, effectively leveraging temporal and spatial information. This approach enables large-scale retrieval and cross-scene generalization without fine-tuning. Extensive experiments on LiRSI-XA demonstrate that, within a 100km2 retrieval range, L2RSI accurately localizes 83.27% of point cloud submaps within a 30m radius for top-1 retrieved location. Our project page is publicly available at https://shizw695.github.io/L2RSI/.

machine learning, natural language, recognition, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(3 more...)

Add feedback

MetaFind: Scene-Aware 3DAsset Retrieval for Coherent Metaverse Scene Generation

Neural Information Processing SystemsJun-23-2026, 00:58:10 GMT

We present MetaFind, a scene-aware tri-modal compositional retrieval framework designed to enhance scene generation in the metaverse by retrieving 3D assets from large-scale repositories. MetaFind addresses two core challenges: (i) inconsistent asset retrieval that overlooks spatial, semantic, and stylistic constraints, and (ii) the absence of a standardized retrieval paradigm specifically tailored for 3D asset retrieval, as existing approaches mainly rely on general-purpose 3D shape representation models. Our key innovation is a flexible retrieval mechanism that supports arbitrary combinations of text, image, and 3D modalities as queries, enhancing spatial reasoning and style consistency by jointly modeling object-level features (including appearance) and scene-level layout structures. Methodologically, MetaFind introduces a plug-and-play equivariant layout encoder ESSGNN that captures spatial relationships and object appearance features, ensuring retrieved 3D assets are contextually and stylistically coherent with the existing scene, regardless of coordinate frame transformations. The framework supports iterative scene construction by continuously adapting retrieval results to current scene updates. Empirical evaluations demonstrate the improved spatial and stylistic consistency of MetaFind in various retrieval tasks compared to baseline methods.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

e75dcde67f29d1e8efd0a86aaa332331-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-23-2026, 00:44:16 GMT

Large Multimodal Models (LMMs) has demonstrated capabilities across various domains, but comprehensive benchmarks for agricultural remote sensing (RS) remain notable scarce.

large language model, machine learning, natural language, (24 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
Europe (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry:

Food & Agriculture > Agriculture (1.00)
Health & Medicine (0.67)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.36)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

e3a0db7c0a191854c176af1d20cdec80-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-23-2026, 00:03:15 GMT

The descriptions of each task are as follows:799 Single-view tasks Single-view tasks test a model's ability to infer spatial properties from a single800 image. These tasks include:801 Depth estimation (OC, OO, NA): Predicting absolute or relative depth values for objects802 Distance prediction (OC, OO, NA): Estimating the Euclidean distance between objects or803 from an object to the camera.804 Object center distance inference (OO, MCA): Given objects A, B and C, determine which805 of B and C is farther or closer to A.806 Object spatial relation (OO, MCA): Determining relative positioning (e.g., left, right, in807 Spatial imagination (OC, OO, MCA): Predicting unseen spatial relationships based on809 limited visual information.810 Multi-view tasks Multi-view tasks require reasoning across multiple images to infer spatial rela-811 tionships. These tasks include:812 Viewpoint change inference (NA): Given two perspectives, output how the camera should813 be moved to see the second perspective.814 Multi-view distance prediction (OC, OO, NA): Estimating object distances across different816 views.817 Multi-view object matching (MCA): Identifying the same object across multiple views.818

artificial intelligence, image understanding, spatial reasoning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.48)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.34)

Add feedback