AITopics

2508.17449

Country:

Europe > France (0.46)
Asia > China (0.46)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education > Educational Setting > Online (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceSep-5-2025

Mapping on a Budget: Optimizing Spatial Data Collection for ML

Betti, Livia, Sanni, Farooq, Sogoyou, Gnouyaro, Agbagla, Togbe, Molitor, Cullen, Carleton, Tamma, Rolf, Esther

In applications across agriculture, ecology, and human development, machine learning with satellite imagery (SatML) is limited by the sparsity of labeled training data. While satellite data cover the globe, labeled training datasets for SatML are often small, spatially clustered, and collected for other purposes (e.g., administrative surveys or field measurements). Despite the pervasiveness of this issue in practice, past SatML research has largely focused on new model architectures and training algorithms to handle scarce training data, rather than modeling data conditions directly. This leaves scientists and policymakers who wish to use SatML for large-scale monitoring uncertain about whether and how to collect additional data to maximize performance. Here, we present the first problem formulation for the optimization of spatial training data in the presence of heterogeneous data collection costs and realistic budget constraints, as well as novel methods for addressing this problem. In experiments simulating different problem settings across three continents and four tasks, our strategies reveal substantial gains from sample optimization. Further experiments delineate settings for which optimized sampling is particularly effective. The problem formulation and methods we introduce are designed to generalize across application domains for SatML; we put special emphasis on a specific problem setting where our coauthors can immediately use our findings to augment clustered agricultural surveys for SatML monitoring in Togo.

artificial intelligence, machine learning, spatial reasoning, (17 more...)

2509.03749

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.88)

Industry:

Food & Agriculture > Agriculture (0.49)
Government > Regional Government > North America Government > United States Government (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.82)

Olivier, François, Bouraoui, Zied

Towards a Neurosymbolic Reasoning System Grounded in Schematic Representations

arXiv.org Artificial IntelligenceSep-5-2025

Despite significant progress in natural language understanding, Large Language Models (LLMs) remain error-prone when performing logical reasoning, often lacking the robust mental representations that enable human-like comprehension. We introduce a prototype neurosymbolic system, Embodied-LM, that grounds understanding and logical reasoning in schematic representations based on image schemas-recurring patterns derived from sensorimotor experience that structure human cognition. Our system operationalizes the spatial foundations of these cognitive structures using declarative spatial reasoning within Answer Set Programming. Through evaluation on logical deduction problems, we demonstrate that LLMs can be guided to interpret scenarios through embodied cognitive structures, that these structures can be formalized as executable programs, and that the resulting representations support effective logical reasoning with enhanced interpretability. While our current implementation focuses on spatial primitives, it establishes the computational foundation for incorporating more complex and dynamic representations.

large language model, logic & formal reasoning, natural language, (20 more...)

2509.03644

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry:

Transportation > Passenger (0.50)
Transportation > Ground > Road (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

arXiv.org Artificial IntelligenceSep-4-2025

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

Shao, Rui, Li, Wei, Zhang, Lingsen, Zhang, Renshan, Liu, Zhiyang, Chen, Ran, Nie, Liqiang

Robotic manipulation, a key frontier in robotics and embodied AI, requires precise motor control and multimodal understanding, yet traditional rule-based methods fail to scale or generalize in unstructured, novel environments. In recent years, Vision-Language-Action (VLA) models, built upon Large Vision-Language Models (VLMs) pretrained on vast image-text datasets, have emerged as a transformative paradigm. This survey provides the first systematic, taxonomy-oriented review of large VLM-based VLA models for robotic manipulation. We begin by clearly defining large VLM-based VLA models and delineating two principal architectural paradigms: (1) monolithic models, encompassing single-system and dual-system designs with differing levels of integration; and (2) hierarchical models, which explicitly decouple planning from execution via interpretable intermediate representations. Building on this foundation, we present an in-depth examination of large VLM-based VLA models: (1) integration with advanced domains, including reinforcement learning, training-free optimization, learning from human videos, and world model integration; (2) synthesis of distinctive characteristics, consolidating architectural traits, operational strengths, and the datasets and benchmarks that support their development; (3) identification of promising directions, including memory mechanisms, 4D perception, efficient adaptation, multi-agent cooperation, and other emerging capabilities. This survey consolidates recent advances to resolve inconsistencies in existing taxonomies, mitigate research fragmentation, and fill a critical gap through the systematic integration of studies at the intersection of large VLMs and robotic manipulation. We provide a regularly updated project page to document ongoing progress: https://github.com/JiuTian-VL/Large-VLM-based-VLA-for-Robotic-Manipulation

large language model, machine learning, reinforcement learning, (20 more...)

2508.13073

Country: Asia > China (0.27)

Genre: Overview (1.00)

Industry: Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

ST-Hyper: Learning High-Order Dependencies Across Multiple Spatial-Temporal Scales for Multivariate Time Series Forecasting

Wu, Binqing, Huang, Jianlong, Shang, Zongjiang, Chen, Ling

In multivariate time series (MTS) forecasting, many deep learning based methods have been proposed for modeling dependencies at multiple spatial (inter-variate) or temporal (intra-variate) scales. However, existing methods may fail to model dependencies across multiple spatial-temporal scales (ST-scales, i.e., scales that jointly consider spatial and temporal scopes). In this work, we propose ST-Hyper to model the high-order dependencies across multiple ST-scales through adaptive hypergraph modeling. Specifically, we introduce a Spatial-Temporal Pyramid Modeling (STPM) module to extract features at multiple ST-scales. Furthermore, we introduce an Adaptive Hypergraph Modeling (AHM) module that learns a sparse hypergraph to capture robust high-order dependencies among features. In addition, we interact with these features through tri-phase hypergraph propagation, which can comprehensively capture multi-scale spatial-temporal dynamics. Experimental results on six real-world MTS datasets demonstrate that ST-Hyper achieves the state-of-the-art performance, outperforming the best baselines with an average MAE reduction of 3.8\% and 6.8\% for long-term and short-term forecasting, respectively.

artificial intelligence, data mining, machine learning, (20 more...)

2509.02217

Country: Asia > China > Zhejiang Province (0.29)

Genre: Research Report (0.64)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

TopoNav: Topological Graphs as a Key Enabler for Advanced Object Navigation

Liu, Peiran, Zhang, Qiang, Peng, Daojie, Zhang, Lingfeng, Qin, Yihao, Zhou, Hang, Ma, Jun, Xu, Renjing, Ji, Yiding

Object Navigation (ObjectNav) has made great progress with large language models (LLMs), but still faces challenges in memory management, especially in long-horizon tasks and dynamic scenes. To address this, we propose TopoNav, a new framework that leverages topological structures as spatial memory. By building and updating a topological graph that captures scene connections, adjacency, and semantic meaning, TopoNav helps agents accumulate spatial knowledge over time, retrieve key information, and reason effectively toward distant goals. Our experiments show that TopoNav achieves state-of-the-art performance on benchmark ObjectNav datasets, with higher success rates and more efficient paths. It particularly excels in diverse and complex environments, as it connects temporary visual inputs with lasting spatial understanding.

large language model, machine learning, natural language, (16 more...)

2509.01364

Country: Asia > China (0.68)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Embodied Spatial Intelligence: from Implicit Scene Modeling to Spatial Reasoning

Fang, Jiading

large language model, machine learning, pattern analysis and machine intelligence, (18 more...)

This thesis introduces "Embodied Spatial Intelligence" to address the challenge of creating robots that can perceive and act in the real world based on natural language instructions. To bridge the gap between Large Language Models (LLMs) and physical embodiment, we present contributions on two fronts: scene representation and spatial reasoning. For perception, we develop robust, scalable, and accurate scene representations using implicit neural models, with contributions in self-supervised camera calibration, high-fidelity depth field generation, and large-scale reconstruction. For spatial reasoning, we enhance the spatial capabilities of LLMs by introducing a novel navigation benchmark, a method for grounding language in 3D, and a state-feedback mechanism to improve long-horizon decision-making. This work lays a foundation for robots that can robustly perceive their surroundings and intelligently act upon complex, language-based commands.

2509.00465

Country: North America > United States (0.92)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Education (0.92)
Media > Photography (0.92)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment

Tang, Jinzhou, zhang, Jusheng, Liu, Sidi, Xiu, Waikit, Lv, Qinhan, Li, Xiying

Achieving human-like reasoning in deep learning models for complex tasks in unknown environments remains a critical challenge in embodied intelligence. While advanced vision-language models (VLMs) excel in static scene understanding, their limitations in spatio-temporal reasoning and adaptation to dynamic, open-set tasks like task-oriented navigation and embodied question answering (EQA) persist due to inadequate modeling of fine-grained spatio-temporal cues and physical world comprehension. To address this, we propose VEME, a novel cross-modal alignment method that enhances generalization in unseen scenes by learning an ego-centric, experience-centered world model. Our framework integrates three key components: (1) a cross-modal alignment framework bridging objects, spatial representations, and visual semantics with spatio-temporal cues to enhance VLM in-context learning; (2) a dynamic, implicit cognitive map activated by world embedding to enable task-relevant geometric-semantic memory recall; and (3) an instruction-based navigation and reasoning framework leveraging embodied priors for long-term planning and efficient exploration. By embedding geometry-aware spatio-temporal episodic experiences, our method significantly improves reasoning and planning in dynamic environments. Experimental results on VSI-Bench and VLN-CE demonstrate 1%-3% accuracy and exploration efficiency improvement compared to traditional approaches.

large language model, machine learning, natural language, (19 more...)

2509.0021

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Consumer Health (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceSep-1-2025

SignLoc: Robust Localization using Navigation Signs and Public Maps

Zimmerman, Nicky, Loo, Joel, Agrawal, Ayush, Hsu, David

To localize, it matches these cues to a large-scale, indoor-outdoor navigation graph, constructed from publicly available maps. Abstract -- Navigation signs and maps, such as floor plans and street maps, are widely available and serve as ubiquitous aids for way-finding in human environments. Y et, they are rarely used by robot systems. This paper presents SignLoc, a global localization method that leverages navigation signs to localize the robot on publicly available maps--specifically floor plans and OpenStreetMap (OSM) graphs-without prior sensor-based mapping. It then employs a probabilistic observation model to match directional and locational cues from the detected signs to the graph, enabling robust topo-semantic localization within a Monte Carlo framework. We evaluated SignLoc in diverse large-scale environments: part of a university campus, a shopping mall, and a hospital complex. Experimental results show that SignLoc reliably localizes the robot after observing only one to two signs. Localizing and navigating in the open world remains a challenge for robots due to the diversity and complexity of human environments.

artificial intelligence, localization, natural language, (19 more...)

2508.18606

Genre: Research Report (0.70)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)

Bovbjerg, Holger Severin, Østergaard, Jan, Jensen, Jesper, Watanabe, Shinji, Tan, Zheng-Hua

Learning Robust Spatial Representations from Binaural Audio through Feature Distillation

arXiv.org Artificial IntelligenceAug-29-2025

Recently, deep representation learning has shown strong performance in multiple audio tasks. However, its use for learning spatial representations from multichannel audio is underexplored. We investigate the use of a pretraining stage based on feature distillation to learn a robust spatial representation of binaural speech without the need for data labels. In this framework, spatial features are computed from clean binaural speech samples to form prediction labels. These clean features are then predicted from corresponding augmented speech using a neural network. After pretraining, we throw away the spatial feature predictor and use the learned encoder weights to initialize a DoA estimation model which we fine-tune for DoA estimation. Our experiments demonstrate that the pretrained models show improved performance in noisy and reverberant environments after fine-tuning for direction-of-arrival estimation, when compared to fully supervised models and classic signal processing methods.

artificial intelligence, machine learning, representation, (18 more...)

2508.20914

Country: Europe > Denmark (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)