AITopics | Spatial Reasoning

Although freely accessible, it is provided in individual files for each year, and is difficult to combine in order to analyze spatial or temporal trends.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Shropshire (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (0.94)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Security & Privacy (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

f4fdf676c3b21f20f8c391d929188386-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-9-2025, 11:55:02 GMT

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Shropshire (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Public Health (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Security & Privacy (0.67)
(3 more...)

Add feedback

Discovering Intrinsic Spatial-Temporal Logic Rules to Explain Human Actions

Neural Information Processing SystemsOct-9-2025, 08:46:47 GMT

In this paper, we introduce a set of spatial-temporal logic rules as knowledge to explain human actions.

artificial intelligence, machine learning, predicate, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Middle East > Israel (0.04)
Asia > China > Hong Kong (0.04)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.90)

Add feedback

Diversifying Spatial-Temporal Perception for Video Domain Generalization Kun-Y u Lin

Neural Information Processing SystemsOct-9-2025, 04:49:46 GMT

However, existing video classification models rely on the i.i.d.

artificial intelligence, computer vision, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.36)

Add feedback

TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking

Liu, Jiahang, Qi, Yunpeng, Zhang, Jiazhao, Li, Minghan, Wang, Shaoan, Wu, Kui, Ye, Hanjing, Zhang, Hong, Chen, Zhibo, Zhong, Fangwei, Zhang, Zhizheng, Wang, He

arXiv.org Artificial IntelligenceOct-9-2025

Embodied Visual Tracking (EVT) is a fundamental ability that underpins practical applications, such as companion robots, guidance robots and service assistants, where continuously following moving targets is essential. Recent advances have enabled language-guided tracking in complex and unstructured scenes. However, existing approaches lack explicit spatial reasoning and effective temporal memory, causing failures under severe occlusions or in the presence of similar-looking distractors. To address these challenges, we present TrackVLA++, a novel Vision-Language-Action (VLA) model that enhances embodied visual tracking with two key modules, a spatial reasoning mechanism and a Target Identification Memory (TIM). The reasoning module introduces a Chain-of-Thought paradigm, termed Polar-CoT, which infers the target's relative position and encodes it as a compact polar-coordinate token for action prediction. Guided by these spatial priors, the TIM employs a gated update strategy to preserve long-horizon target memory, ensuring spatiotemporal consistency and mitigating target loss during extended occlusions. Extensive experiments show that TrackVLA++ achieves state-of-the-art performance on public benchmarks across both egocentric and multi-camera settings. On the challenging EVT-Bench DT split, TrackVLA++ surpasses the previous leading approach by 5.1 and 12, respectively. Furthermore, TrackVLA++ exhibits strong zero-shot generalization, enabling robust real-world tracking in dynamic and occluded scenarios.

arxiv preprint arxiv, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2510.07134

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.69)

Add feedback

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

Liu, Junli, Chen, Qizhi, Wang, Zhigang, Tang, Yiwen, Zhang, Yiting, Yan, Chi, Wang, Dong, Li, Xuelong, Zhao, Bin

arXiv.org Artificial IntelligenceOct-9-2025

Visual grounding (VG) aims to localize target objects in an image based on natural language descriptions. In this paper, we propose AerialVG, a new task focusing on visual grounding from aerial views. Compared to traditional VG, AerialVG poses new challenges, \emph{e.g.}, appearance-based grounding is insufficient to distinguish among multiple visually similar objects, and positional relations should be emphasized. Besides, existing VG models struggle when applied to aerial imagery, where high-resolution images cause significant difficulties. To address these challenges, we introduce the first AerialVG dataset, consisting of 5K real-world aerial images, 50K manually annotated descriptions, and 103K objects. Particularly, each annotation in AerialVG dataset contains multiple target objects annotated with relative spatial relations, requiring models to perform comprehensive spatial reasoning. Furthermore, we propose an innovative model especially for the AerialVG task, where a Hierarchical Cross-Attention is devised to focus on target regions, and a Relation-Aware Grounding module is designed to infer positional relations. Experimental results validate the effectiveness of our dataset and method, highlighting the importance of spatial reasoning in aerial visual grounding. The code and dataset will be released.

artificial intelligence, dataset, spatial reasoning, (12 more...)

arXiv.org Artificial Intelligence

2504.07836

Country: