AITopics

2505.05515

Country:

North America > United States (0.45)
Asia > China (0.28)
Europe > United Kingdom > England (0.27)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
(12 more...)

Taguchi, Shun, Deguchi, Hideki, Hamazaki, Takumi, Sakai, Hiroyuki

SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models

arXiv.org Artificial IntelligenceMay-9-2025

This study introduces SpatialPrompting, a novel framework that harnesses the emergent reasoning capabilities of off-the-shelf multimodal large language models to achieve zero-shot spatial reasoning in three-dimensional (3D) environments. Unlike existing methods that rely on expensive 3D-specific fine-tuning with specialized 3D inputs such as point clouds or voxel-based features, SpatialPrompting employs a keyframe-driven prompt generation strategy. This framework uses metrics such as vision-language similarity, Mahalanobis distance, field of view, and image sharpness to select a diverse and informative set of keyframes from image sequences and then integrates them with corresponding camera pose data to effectively abstract spatial relationships and infer complex 3D structures. The proposed framework not only establishes a new paradigm for flexible spatial reasoning that utilizes intuitive visual and positional cues but also achieves state-of-the-art zero-shot performance on benchmark datasets, such as ScanQA and SQA3D, across several metrics. The proposed method effectively eliminates the need for specialized 3D inputs and fine-tuning, offering a simpler and more scalable alternative to conventional approaches.

large language model, machine learning, natural language, (16 more...)

2505.04911

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceMay-7-2025

FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios

Zhang, Shiyi, Zhuang, Junhao, Zhang, Zhaoyang, Shan, Ying, Tang, Yansong

Action customization involves generating videos where the subject performs actions dictated by input control signals. Current methods use pose-guided or global motion customization but are limited by strict constraints on spatial structure, such as layout, skeleton, and viewpoint consistency, reducing adaptability across diverse subjects and scenarios. To overcome these limitations, we propose FlexiAct, which transfers actions from a reference video to an arbitrary target image. Unlike existing methods, FlexiAct allows for variations in layout, viewpoint, and skeletal structure between the subject of the reference video and the target image, while maintaining identity consistency. Achieving this requires precise action control, spatial structure adaptation, and consistency preservation. To this end, we introduce RefAdapter, a lightweight image-conditioned adapter that excels in spatial adaptation and consistency preservation, surpassing existing methods in balancing appearance consistency and structural flexibility. Additionally, based on our observations, the denoising process exhibits varying levels of attention to motion (low frequency) and appearance details (high frequency) at different timesteps. So we propose FAE (Frequency-aware Action Extraction), which, unlike existing methods that rely on separate spatial-temporal architectures, directly achieves action extraction during the denoising process. Experiments demonstrate that our method effectively transfers actions to subjects with diverse layouts, skeletons, and viewpoints. We release our code and model weights to support further research at https://shiyi-zh0408.github.io/projectpages/FlexiAct/

machine learning, natural language, reference video, (15 more...)

2505.0373

Country:

North America > Canada (0.17)
Asia > China (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
(2 more...)

WIREDMay-5-2025, 14:00:00 GMT

Sightful Spacetop for Windows Review: Spatial Computing Is Here

I've been eagerly awaiting the advent of spatial computing. My home office desk setup, with multiple screens and browser windows, helps me be very productive. But on the go, I'm relegated to a laptop's 13-inch screen (or packing a portable monitor), and I'm not as efficient. Spatial computing--usually driven by a mixed reality headset or smart glasses--lets you craft a multi-monitor virtual workspace, where you can place apps and browser windows around your periphery, to replicate the experience you have set up at home or the office. Or you can take it a step further because you're only limited by your imagination.

artificial intelligence, spacetop, spatial reasoning, (12 more...)

WIRED

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.88)

Chapuma, Evelyn, Mengezi, Grey, Msasa, Lewis, Taylor, Amelia

mwBTFreddy: A Dataset for Flash Flood Damage Assessment in Urban Malawi

arXiv.org Artificial IntelligenceMay-5-2025

This paper describes the mwBTFreddy dataset, a resource developed to support flash flood damage assessment in urban Malawi, specifically focusing on the impacts of Cyclone Freddy in 2023. The dataset comprises paired pre- and post-disaster satellite images sourced from Google Earth Pro, accompanied by JSON files containing labelled building annotations with geographic coordinates and damage levels (no damage, minor, major, or destroyed). Developed by the Kuyesera AI Lab at the Malawi University of Business and Applied Sciences, this dataset is intended to facilitate the development of machine learning models tailored to building detection and damage classification in African urban contexts. It also supports flood damage visualisation and spatial analysis to inform decisions on relocation, infrastructure planning, and emergency response in climate-vulnerable regions.

artificial intelligence, machine learning, spatial reasoning, (17 more...)

2505.01242

Country:

North America > United States > California (0.14)
Africa > Malawi > Southern Region (0.14)

Genre: Research Report (0.82)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)

Explainable AI in Spatial Analysis

Li, Ziqi

A key objective in spatial analysis is to model spatial relationships and infer spatial processes to generate knowledge from spatial data, which has been largely based on spatial statistical methods. More recently, machine learning offers scalable and flexible approach es that complement traditional methods and has been increasingly applied in spatial data science . Despite its advantages, machine learning is often criticized for being a black box, which limits our understanding of model behavior and output . Recognizing this limitation, XAI has emerged as a pivotal field in AI that provides methods to explain the output of machine learning models to enhance transparency and understanding. These methods are crucial for model diagnosis, bias detection, and ensuring the reliability of results obtained from machine learning models. This chapter introduces key concepts and methods in XAI with a focus on Shapley value - based approach es, which is arguably the most popular XAI method, and their integration with spatial analysis. An empirical example of county - level voting behaviors in the 2020 Presidential election is presented to demonstrate the use of Shapley values and spatial analysis with a comparison to multi - scale geograp hically weighted regression . The chapter concludes with a discussion on the challenges and limitations of current XAI techniques and proposes new directions .

artificial intelligence, machine learning, shapley value, (16 more...)

2505.00591

Country: North America > United States (1.00)

Genre:

Summary/Review (0.66)
Research Report (0.64)

Industry:

Transportation (0.88)
Government > Regional Government > North America Government > United States Government (0.68)
Government > Voting & Elections (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

DeepSTA: A Spatial-Temporal Attention Network for Logistics Delivery Timely Rate Prediction in Anomaly Conditions

Yi, Jinhui, Yan, Huan, Wang, Haotian, Yuan, Jian, Li, Yong

Prediction of couriers' delivery timely rates in advance is essential to the logistics industry, enabling companies to take preemptive measures to ensure the normal operation of delivery services. This becomes even more critical during anomaly conditions like the epidemic outbreak, during which couriers' delivery timely rate will decline markedly and fluctuates significantly. Existing studies pay less attention to the logistics scenario. Moreover, many works focusing on prediction tasks in anomaly scenarios fail to explicitly model abnormal events, e.g., treating external factors equally with other features, resulting in great information loss. Further, since some anomalous events occur infrequently, traditional data-driven methods perform poorly in these scenarios. To deal with them, we propose a deep spatial-temporal attention model, named DeepSTA. To be specific, to avoid information loss, we design an anomaly spatio-temporal learning module that employs a recurrent neural network to model incident information. Additionally, we utilize Node2vec to model correlations between road districts, and adopt graph neural networks and long short-term memory to capture the spatial-temporal dependencies of couriers. To tackle the issue of insufficient training data in abnormal circumstances, we propose an anomaly pattern attention module that adopts a memory network for couriers' anomaly feature patterns storage via attention mechanisms. The experiments on real-world logistics datasets during the COVID-19 outbreak in 2022 show the model outperforms the best baselines by 12.11% in MAE and 13.71% in MSE, demonstrating its superior performance over multiple competitive baselines.

artificial intelligence, courier, machine learning, (15 more...)

doi: 10.1145/3583780.3614671

2505.00402

Country: Asia > China (0.48)

Genre: Research Report (0.50)

Industry:

Transportation > Freight & Logistics Services (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.90)
Health & Medicine > Therapeutic Area > Immunology (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Learning to Estimate Package Delivery Time in Mixed Imbalanced Delivery and Pickup Logistics Services

Yi, Jinhui, Yan, Huan, Wang, Haotian, Yuan, Jian, Li, Yong

Accurately estimating package delivery time is essential to the logistics industry, which enables reasonable work allocation and on-time service guarantee. This becomes even more necessary in mixed logistics scenarios where couriers handle a high volume of delivery and a smaller number of pickup simultaneously. However, most of the related works treat the pickup and delivery patterns on couriers' decision behavior equally, neglecting that the pickup has a greater impact on couriers' decision-making compared to the delivery due to its tighter time constraints. In such context, we have three main challenges: 1) multiple spatiotemporal factors are intricately interconnected, significantly affecting couriers' delivery behavior; 2) pickups have stricter time requirements but are limited in number, making it challenging to model their effects on couriers' delivery process; 3) couriers' spatial mobility patterns are critical determinants of their delivery behavior, but have been insufficiently explored. To deal with these, we propose TransPDT, a Transformer-based multi-task package delivery time prediction model. We first employ the Transformer encoder architecture to capture the spatio-temporal dependencies of couriers' historical travel routes and pending package sets. Then we design the pattern memory to learn the patterns of pickup in the imbalanced dataset via attention mechanism. We also set the route prediction as an auxiliary task of delivery time prediction, and incorporate the prior courier spatial movement regularities in prediction. Extensive experiments on real industry-scale datasets demonstrate the superiority of our method. A system based on TransPDT is deployed internally in JD Logistics to track more than 2000 couriers handling hundreds of thousands of packages per day in Beijing.

artificial intelligence, courier, machine learning, (19 more...)

doi: 10.1145/3678717.3691266

2505.00375

Country:

North America > United States (0.30)
Asia > China > Beijing > Beijing (0.26)

Genre: Research Report (0.40)

Industry: Transportation > Freight & Logistics Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.89)

Temporal Attention Evolutional Graph Convolutional Network for Multivariate Time Series Forecasting

Zhao, Xinlong, Zhang, Liying, Zou, Tianbo, Zhang, Yan

Multivariate time series forecasting enables the prediction of future states by leveraging historical data, thereby facilitating decision-making processes. Each data node in a multivariate time series encompasses a sequence of multiple dimensions. These nodes exhibit interdependent relationships, forming a graph structure. While existing prediction methods often assume a fixed graph structure, many real-world scenarios involve dynamic graph structures. Moreover, interactions among time series observed at different time scales vary significantly. To enhance prediction accuracy by capturing precise temporal and spatial features, this paper introduces the Temporal Attention Evolutional Graph Convolutional Network (TAEGCN). This novel method not only integrates causal temporal convolution and a multi-head self-attention mechanism to learn temporal features of nodes, but also construct the dynamic graph structure based on these temporal features to keep the consistency of the changing in spatial feature with temporal series. TAEGCN adeptly captures temporal causal relationships and hidden spatial dependencies within the data. Furthermore, TAEGCN incorporates a unified neural network that seamlessly integrates these components to generate final predictions. Experimental results conducted on two public transportation network datasets, METR-LA and PEMS-BAY, demonstrate the superior performance of the proposed model.

data mining, graph structure, machine learning, (17 more...)

2505.00302

Country:

North America (0.29)
Asia > China (0.15)

Genre: Research Report > Promising Solution (0.34)

Industry: Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.87)

Ogezi, Michael, Shi, Freda

SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data

arXiv.org Artificial IntelligenceApr-30-2025

Vision-language models (VLMs) work well in tasks ranging from image captioning to visual question answering (VQA), yet they struggle with spatial reasoning, a key skill for understanding our physical world that humans excel at. We find that spatial relations are generally rare in widely used VL datasets, with only a few being well represented, while most form a long tail of underrepresented relations. This gap leaves VLMs ill-equipped to handle diverse spatial relationships. To bridge it, we construct a synthetic VQA dataset focused on spatial reasoning generated from hyper-detailed image descriptions in Localized Narratives, DOCCI, and PixMo-Cap. Our dataset consists of 455k samples containing 3.4 million QA pairs. Trained on this dataset, our Spatial-Reasoning Enhanced (SpaRE) VLMs show strong improvements on spatial reasoning benchmarks, achieving up to a 49% performance gain on the What's Up benchmark, while maintaining strong results on general tasks. Our work narrows the gap between human and VLM spatial reasoning and makes VLMs more capable in real-world tasks such as robotics and navigation.

artificial intelligence, natural language, spatial reasoning, (16 more...)

2504.20648

Country: Europe > Switzerland (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)