AITopics | spatial question

Collaborating Authors

spatial question

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Warehouse Spatial Question Answering with LLM Agent

Huang, Hsiang-Wei, Cheng, Jen-Hao, Chen, Kuang-Ming, Yang, Cheng-Yen, Alattar, Bahaa, Lin, Yi-Ru, Kim, Pyongkun, Kim, Sangwon, Kim, Kwangju, Huang, Chung-I, Hwang, Jenq-Neng

arXiv.org Artificial IntelligenceAug-15-2025

Spatial understanding has been a challenging task for existing Multi-modal Large Language Models~(MLLMs). Previous methods leverage large-scale MLLM finetuning to enhance MLLM's spatial understanding ability. In this paper, we present a data-efficient approach. We propose a LLM agent system with strong and advanced spatial reasoning ability, which can be used to solve the challenging spatial question answering task in complex indoor warehouse scenarios. Our system integrates multiple tools that allow the LLM agent to conduct spatial reasoning and API tools interaction to answer the given complicated spatial question. Extensive evaluations on the 2025 AI City Challenge Physical AI Spatial Intelligence Warehouse dataset demonstrate that our system achieves high accuracy and efficiency in tasks such as object retrieval, counting, and distance estimation. The code is available at: https://github.com/hsiangwei0903/SpatialAgent

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.10778

Country: Asia > South Korea (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SAT: Spatial Aptitude Training for Multimodal Language Models

Ray, Arijit, Duan, Jiafei, Tan, Reuben, Bashkirova, Dina, Hendrix, Rose, Ehsani, Kiana, Kembhavi, Aniruddha, Plummer, Bryan A., Krishna, Ranjay, Zeng, Kuo-Hao, Saenko, Kate

arXiv.org Artificial IntelligenceDec-10-2024

Spatial perception is a fundamental component of intelligence. While many studies highlight that large multimodal language models (MLMs) struggle to reason about space, they only test for static spatial reasoning, such as categorizing the relative positions of objects. Meanwhile, real-world deployment requires dynamic capabilities like perspective-taking and egocentric action recognition. As a roadmap to improving spatial intelligence, we introduce SAT, Spatial Aptitude Training, which goes beyond static relative object position questions to the more dynamic tasks. SAT contains 218K question-answer pairs for 22K synthetic scenes across a training and testing set. Generated using a photo-realistic physics engine, our dataset can be arbitrarily scaled and easily extended to new actions, scenes, and 3D assets. We find that even MLMs that perform relatively well on static questions struggle to accurately answer dynamic spatial questions. Further, we show that SAT instruction-tuning data improves not only dynamic spatial reasoning on SAT, but also zero-shot performance on existing real-image spatial benchmarks: $23\%$ on CVBench, $8\%$ on the harder BLINK benchmark, and $18\%$ on VSR. When instruction-tuned on SAT, our 13B model matches larger proprietary MLMs like GPT4-V and Gemini-3-1.0 in spatial reasoning. Our data/code is available at http://arijitray1993.github.io/SAT/ .

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.07755

Country:

South America > Brazil (0.04)
North America > United States > Rhode Island (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.81)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

MQA: Answering the Question via Robotic Manipulation

Deng, Yuhong, Zhang, Naifu, Guo, Di, Liu, Huaping, Sun, Fuchun, Pang, Chen, Pang, Jing

arXiv.org Artificial IntelligenceMar-10-2020

In this paper,we propose a novel task of Manipulation Question Answering(MQA),a class of Question Answering (QA) task, where the robot is required to find the answer to the question by actively interacting with the environment via manipulation. Considering the tabletop scenario, a heatmap of the scene is generated to facilitate the robot to have a semantic understanding of the scene and an imitation learning approach with semantic understanding metric is proposed to generate manipulation actions which guide the manipulator to explore the tabletop to find the answer to the question. Besides, a novel dataset which contains a variety of tabletop scenarios and corresponding question-answer pairs is established. Extensive experiments have been conducted to validate the effectiveness of the proposed framework.

dataset, existence question, robot, (15 more...)

arXiv.org Artificial Intelligence

2003.04641

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World

Platonov, Georgiy, Kane, Benjamin, Gindi, Aaron, Schubert, Lenhart K.

arXiv.org Artificial IntelligenceNov-6-2019

The blocks world is a classic toy domain that has long been used to build and test spatial reasoning systems. Despite its relative simplicity, tackling this domain in its full complexity requires the agent to exhibit a rich set of functional capabilities, ranging from vision to natural language understanding. There is currently a resurgence of interest in solving problems in such limited domains using modern techniques. In this work we tackle spatial question answering in a holistic way, using a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modeling. The contributions of this work include a semantic parser that maps spatial questions into logical forms consistent with a general approach to meaning representation, a dialog manager based on a schema representation, and a constraint solver for spatial questions that provides answers in agreement with human perception. These and other components are integrated into a multi-modal human-computer interaction pipeline.

relation, spatial question, spatial relation, (16 more...)

arXiv.org Artificial Intelligence

1911.02524

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)

Genre: Research Report (0.50)

Industry: Consumer Products & Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback