AITopics

2508.04846

Country: Asia > Middle East > Iran (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Services (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.86)

arXiv.org Artificial IntelligenceAug-8-2025

Information-Theoretic Graph Fusion with Vision-Language-Action Model for Policy Reasoning and Dual Robotic Control

Li, Shunlei, Gao, Longsen, Wang, Jin, Che, Chang, Xiao, Xi, Cao, Jiuwen, Hu, Yingbai, Karimi, Hamid Reza

Teaching robots dexterous skills from human videos remains challenging due to the reliance on low-level trajectory imitation, which fails to generalize across object types, spatial layouts, and manipulator configurations. We propose Graph-Fused Vision-Language-Action (GF-VLA), a framework that enables dual-arm robotic systems to perform task-level reasoning and execution directly from RGB and Depth human demonstrations. GF-VLA first extracts Shannon-information-based cues to identify hands and objects with the highest task relevance, then encodes these cues into temporally ordered scene graphs that capture both hand-object and object-object interactions. These graphs are fused with a language-conditioned transformer that generates hierarchical behavior trees and interpretable Cartesian motion commands. To improve execution efficiency in bimanual settings, we further introduce a cross-hand selection policy that infers optimal gripper assignment without explicit geometric reasoning. We evaluate GF-VLA on four structured dual-arm block assembly tasks involving symbolic shape construction and spatial generalization. Experimental results show that the information-theoretic scene representation achieves over 95 percent graph accuracy and 93 percent subtask segmentation, supporting the LLM planner in generating reliable and human-readable task policies. When executed by the dual-arm robot, these policies yield 94 percent grasp success, 89 percent placement accuracy, and 90 percent overall task success across stacking, letter-building, and geometric reconfiguration scenarios, demonstrating strong generalization and robustness across diverse spatial and semantic variations.

large language model, machine learning, natural language, (20 more...)

2508.05342

Country:

Europe (1.00)
North America > United States (0.67)
Asia > China > Zhejiang Province (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.66)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)
(2 more...)

arXiv.org Artificial IntelligenceAug-8-2025

Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses

Han, Bin, Wolfe, Robert, Caspi, Anat, Howe, Bill

We explore the application of large language models (LLMs) to empower domain experts in integrating large, heterogeneous, and noisy urban spatial datasets. Traditional rule-based integration methods are unable to cover all edge cases, requiring manual verification and repair. Machine learning approaches require collecting and labeling of large numbers of task-specific samples. In this study, we investigate the potential of LLMs for spatial data integration. Our analysis first considers how LLMs reason about environmental spatial relationships mediated by human experience, such as between roads and sidewalks. We show that while LLMs exhibit spatial reasoning capabilities, they struggle to connect the macro-scale environment with the relevant computational geometry tasks, often producing logically incoherent responses. But when provided relevant features, thereby reducing dependence on spatial reasoning, LLMs are able to generate high-performing results. We then adapt a review-and-refine method, which proves remarkably effective in correcting erroneous initial responses while preserving accurate responses. We discuss practical implications of employing LLMs for spatial data integration in real-world contexts and outline future research directions, including post-training, multi-modal integration methods, and support for diverse data formats. Our findings position LLMs as a promising and flexible alternative to traditional rule-based heuristics, advancing the capabilities of adaptive spatial data integration.

large language model, machine learning, natural language, (21 more...)

2508.05009

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

The GuardianAug-7-2025, 23:20:19 GMT

South Korea set to decide whether to let Google Maps finally work properly

For tourists visiting South Korea, one of the world's most technologically advanced nations, navigating the country's urban heartlands can prove surprisingly frustrating for one simple reason: Google Maps just doesn't work effectively. But on 11 August that could change, as South Korean authorities are set to decide whether to finally grant Google's request to export the country's detailed mapping data to overseas servers. Such a move would open up functionality that allows the app to give detailed directions and show users the best routes to travel. It is a debate spanning nearly two decades which has evolved into a broader test of how democracies balance digital sovereignty with economic openness. Local industry groups are warning of market domination from foreign companies, while those who back Google's request argue restrictions harm tourism and innovation.

google, mapping data, south korea, (11 more...)

The Guardian

Country:

North America > United States (0.16)
Asia > China (0.06)
Europe > Italy (0.05)
(3 more...)

Industry:

Information Technology > Services (0.77)
Government > Regional Government (0.51)

Technology:

Information Technology > Geographic Information Systems (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.51)

RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case

Xiao, Baihui, Feng, Chengjian, Huang, Zhijian, yan, Feng, Zhong, Yujie, Ma, Lin

Collecting real-world data for rare high-risk scenarios, long-tailed driving events, and complex interactions remains challenging, leading to poor performance of existing autonomous driving systems in these critical situations. In this paper, we propose RoboTron-Sim that improves real-world driving in critical situations by utilizing simulated hard cases. First, we develop a simulated dataset called Hard-case Augmented Synthetic Scenarios (HASS), which covers 13 high-risk edge-case categories, as well as balanced environmental conditions such as day/night and sunny/rainy. Second, we introduce Scenario-aware Prompt Engineering (SPE) and an Image-to-Ego Encoder (I2E Encoder) to enable multimodal large language models to effectively learn real-world challenging driving skills from HASS, via adapting to environmental deviations and hardware differences between real-world and simulated scenarios. Extensive experiments on nuScenes show that RoboTron-Sim improves driving performance in challenging scenarios by around 50%, achieving state-of-the-art results in real-world open-loop planning. Qualitative results further demonstrate the effectiveness of RoboTron-Sim in better managing rare high-risk driving scenarios. Project page: https://stars79689.github.io/RoboTron-Sim/

large language model, machine learning, natural language, (17 more...)

2508.04642

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.91)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)

$NavA^3$: Understanding Any Instruction, Navigating Anywhere, Finding Anything

Zhang, Lingfeng, Hao, Xiaoshuai, Tang, Yingbo, Fu, Haoxiang, Zheng, Xinyu, Wang, Pengwei, Wang, Zhongyuan, Ding, Wenbo, Zhang, Shanghang

Embodied navigation is a fundamental capability of embodied intelligence, enabling robots to move and interact within physical environments. However, existing navigation tasks primarily focus on predefined object navigation or instruction following, which significantly differs from human needs in real-world scenarios involving complex, open-ended scenes. To bridge this gap, we introduce a challenging long-horizon navigation task that requires understanding high-level human instructions and performing spatial-aware object navigation in real-world environments. Existing embodied navigation methods struggle with such tasks due to their limitations in comprehending high-level human instructions and localizing objects with an open vocabulary. In this paper, we propose $NavA^3$, a hierarchical framework divided into two stages: global and local policies. In the global policy, we leverage the reasoning capabilities of Reasoning-VLM to parse high-level human instructions and integrate them with global 3D scene views. This allows us to reason and navigate to regions most likely to contain the goal object. In the local policy, we have collected a dataset of 1.0 million samples of spatial-aware object affordances to train the NaviAfford model (PointingVLM), which provides robust open-vocabulary object localization and spatial awareness for precise goal identification and navigation in complex environments. Extensive experiments demonstrate that $NavA^3$ achieves SOTA results in navigation performance and can successfully complete longhorizon navigation tasks across different robot embodiments in real-world settings, paving the way for universal embodied navigation. The dataset and code will be made available. Project website: https://NavigationA3.github.io/.

large language model, machine learning, natural language, (19 more...)

2508.04598

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(4 more...)

Segment Any Vehicle: Semantic and Visual Context Driven SAM and A Benchmark

Wang, Xiao, Wang, Ziwen, Wu, Wentao, Wang, Anjie, Wu, Jiashu, Pan, Yantao, Li, Chenglong

--With the rapid advancement of autonomous driving, vehicle perception, particularly detection and segmentation, has placed increasingly higher demands on algorithmic performance. Pre-trained large segmentation models, especially Segment Anything Model (SAM), have sparked significant interest and inspired new research directions in artificial intelligence. However, SAM cannot be directly applied to the fine-grained task of vehicle part segmentation, as its text-prompted segmentation functionality is not publicly accessible, and the mask regions generated by its default mode lack semantic labels, limiting its utility in structured, category-specific segmentation tasks. T o address these limitations, we propose SA V, a novel framework comprising three core components: a SAM-based encoder-decoder, a vehicle part knowledge graph, and a context sample retrieval encoding module. The knowledge graph explicitly models the spatial and geometric relationships among vehicle parts through a structured ontology, effectively encoding prior structural knowledge. Meanwhile, the context retrieval module enhances segmentation by identifying and leveraging visually similar vehicle instances from training data, providing rich contextual priors for improved generalization. Furthermore, we introduce a new large-scale benchmark dataset for vehicle part segmentation, named V ehicleSeg10K, which contains 11,665 high-quality pixel-level annotations across diverse scenes and viewpoints. We conduct comprehensive experiments on this dataset and two other datasets, benchmarking multiple representative baselines to establish a solid foundation for future research and comparison. Both the dataset and source code of this paper will be released on https://github.com/Event-AHU/SA EHICLE segmentation [1] plays a crucial role in modern intelligent transportation systems [2]-[4], serving as a fundamental component for autonomous driving, advanced driver-assistance systems (ADAS), and intelligent traffic management.

machine learning, natural language, segmentation, (20 more...)

2508.0426

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
(2 more...)

Spatial-Frequency Aware for Object Detection in RAW Image

Ye, Zhuohua, Zhang, Liming, Han, Hongru

Direct RAW-based object detection offers great promise by utilizing RAW data (unprocessed sensor data), but faces inherent challenges due to its wide dynamic range and linear response, which tends to suppress crucial object details. In particular, existing enhancement methods are almost all performed in the spatial domain, making it difficult to effectively recover these suppressed details from the skewed pixel distribution of RAW images. To address this limitation, we turn to the frequency domain, where features, such as object contours and textures, can be naturally separated based on frequency. In this paper, we propose Space-Frequency Aware RAW Image Object Detection Enhancer (SFAE), a novel framework that synergizes spatial and frequency representations. Our contribution is threefold. The first lies in the ``spatialization" of frequency bands. Different from the traditional paradigm of directly manipulating abstract spectra in deep networks, our method inversely transforms individual frequency bands back into tangible spatial maps, thus preserving direct physical intuition. Then the cross-domain fusion attention module is developed to enable deep multimodal interactions between these maps and the original spatial features. Finally, the framework performs adaptive nonlinear adjustments by predicting and applying different gamma parameters for the two domains.

artificial intelligence, computer vision, machine learning, (12 more...)

2508.01396

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.67)

arXiv.org Artificial IntelligenceAug-6-2025

Spatial Imputation Drives Cross-Domain Alignment for EEG Classification

Liu, Hongjun, Yao, Chao, Zhang, Yalan, wang, Xiaokun, Ban, Xiaojuan

Electroencephalogram (EEG) signal classification faces significant challenges due to data distribution shifts caused by heterogeneous electrode configurations, acquisition protocols, and hardware discrepancies across domains. This paper introduces IMAC, a novel channel-dependent mask and imputation self-supervised framework that formulates the alignment of cross-domain EEG data shifts as a spatial time series imputation task. To address heterogeneous electrode configurations in cross-domain scenarios, IMAC first standardizes different electrode layouts using a 3D-to-2D positional unification mapping strategy, establishing unified spatial representations. Unlike previous mask-based self-supervised representation learning methods, IMAC introduces spatio-temporal signal alignment. This involves constructing a channel-dependent mask and reconstruction task framed as a low-to-high resolution EEG spatial imputation problem. Consequently, this approach simulates cross-domain variations such as channel omissions and temporal instabilities, thus enabling the model to leverage the proposed imputer for robust signal alignment during inference. Furthermore, IMAC incorporates a disentangled structure that separately models the temporal and spatial information of the EEG signals separately, reducing computational complexity while enhancing flexibility and adaptability. Comprehensive evaluations across 10 publicly available EEG datasets demonstrate IMAC's superior performance, achieving state-of-the-art classification accuracy in both cross-subject and cross-center validation scenarios. Notably, IMAC shows strong robustness under both simulated and real-world distribution shifts, surpassing baseline methods by up to $35$\% in integrity scores while maintaining consistent classification accuracy.

artificial intelligence, machine learning, spatial reasoning, (12 more...)

doi: 10.1145/3746027.3755582

2508.03437

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Gajer, Pawel, Ravel, Jacques

The Geometry of Machine Learning Models

arXiv.org Artificial IntelligenceAug-5-2025

This paper presents a mathematical framework for analyzing machine learning models through the geometry of their induced partitions. By representing partitions as Riemannian simplicial complexes, we capture not only adjacency relationships but also geometric properties including cell volumes, volumes of faces where cells meet, and dihedral angles between adjacent cells. For neural networks, we introduce a differential forms approach that tracks geometric structure through layers via pullback operations, making computations tractable by focusing on data-containing cells. The framework enables geometric regularization that directly penalizes problematic spatial configurations and provides new tools for model refinement through extended Laplacians and simplicial splines. We also explore how data distribution induces effective geometric curvature in model partitions, developing discrete curvature measures for vertices that quantify local geometric complexity and statistical Ricci curvature for edges that captures pairwise relationships between cells. While focused on mathematical foundations, this geometric perspective offers new approaches to model interpretation, regularization, and diagnostic tools for understanding learning dynamics.

artificial intelligence, machine learning, spatial reasoning, (20 more...)

2508.0208

Country: North America > United States (1.00)

Genre: Research Report (0.81)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)