AITopics | spatial task

Collaborating Authors

spatial task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models

Chen, Pingyi, Lou, Yujing, Cao, Shen, Guo, Jinhui, Fan, Lubin, Wu, Yue, Yang, Lin, Ma, Lizhuang, Ye, Jieping

arXiv.org Artificial IntelligenceSep-23-2025

While vision language models (VLMs) excel in 2D semantic visual understanding, their ability to quantitatively reason about 3D spatial relationships remains under-explored, due to the deficiency of 2D images' spatial representation ability. In this paper, we analyze the problem hindering VLMs' spatial understanding abilities and propose SD-VLM, a novel framework that significantly enhances fundamental spatial perception abilities of VLMs through two key contributions: (1) propose Massive Spatial Measuring and Understanding (MSMU) dataset with precise spatial annotations, and (2) introduce a simple depth positional encoding method strengthening VLMs' spatial awareness. MSMU dataset covers massive quantitative spatial tasks with 700K QA pairs, 2.5M physical numerical annotations, and 10K chain-of-thought augmented samples. We have trained SD-VLM, a strong generalist VLM which shows superior quantitative spatial measuring and understanding capability. SD-VLM not only achieves state-of-the-art performance on our proposed MSMU-Bench, but also shows spatial generalization abilities on other spatial understanding benchmarks including Q-Spatial and SpatialRGPT-Bench. Extensive experiments demonstrate that SD-VLM outperforms GPT-4o and Intern-VL3-78B by 26.91% and 25.56% respectively on MSMU-Bench. Code and models are released at https://github.com/cpystan/SD-VLM.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.17664

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study

Xu, Liuchang, Zhao, Shuo, Lin, Qingming, Chen, Luyao, Luo, Qianqian, Wu, Sensen, Ye, Xinyue, Feng, Hailin, Du, Zhenhong

arXiv.org Artificial IntelligenceSep-2-2024

The advent of large language models such as ChatGPT, Gemini, and others has underscored the importance of evaluating their diverse capabilities, ranging from natural language understanding to code generation. However, their performance on spatial tasks has not been comprehensively assessed. This study addresses this gap by introducing a novel multi-task spatial evaluation dataset, designed to systematically explore and compare the performance of several advanced models on spatial tasks. The dataset encompasses twelve distinct task types, including spatial understanding and path planning, each with verified, accurate answers. We evaluated multiple models, including OpenAI's gpt-3.5-turbo, gpt-4o, and ZhipuAI's glm-4, through a two-phase testing approach. Initially, we conducted zero-shot testing, followed by categorizing the dataset by difficulty and performing prompt tuning tests. Results indicate that gpt-4o achieved the highest overall accuracy in the first phase, with an average of 71.3%. Although moonshot-v1-8k slightly underperformed overall, it surpassed gpt-4o in place name recognition tasks. The study also highlights the impact of prompt strategies on model performance in specific tasks. For example, the Chain-of-Thought (COT) strategy increased gpt-4o's accuracy in path planning from 12.4% to 87.5%, while a one-shot strategy enhanced moonshot-v1-8k's accuracy in mapping tasks from 10.1% to 76.3%.

language model, multi-task benchmarking study, spatial task

arXiv.org Artificial Intelligence

2408.14438

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Referee-Meta-Learning for Fast Adaptation of Locational Fairness

Chen, Weiye, Xie, Yiqun, Jia, Xiaowei, He, Erhu, Bao, Han, An, Bang, Zhou, Xun

arXiv.org Artificial IntelligenceFeb-20-2024

When dealing with data from distinct locations, machine learning algorithms tend to demonstrate an implicit preference of some locations over the others, which constitutes biases that sabotage the spatial fairness of the algorithm. This unfairness can easily introduce biases in subsequent decision-making given broad adoptions of learning-based solutions in practice. However, locational biases in AI are largely understudied. To mitigate biases over locations, we propose a locational meta-referee (Meta-Ref) to oversee the few-shot meta-training and meta-testing of a deep neural network. Meta-Ref dynamically adjusts the learning rates for training samples of given locations to advocate a fair performance across locations, through an explicit consideration of locational biases and the characteristics of input data. We present a three-phase training framework to learn both a meta-learning-based predictor and an integrated Meta-Ref that governs the fairness of the model. Once trained with a distribution of spatial tasks, Meta-Ref is applied to samples from new spatial tasks (i.e., regions outside the training area) to promote fairness during the fine-tune step. We carried out experiments with two case studies on crop monitoring and transportation safety, which show Meta-Ref can improve locational fairness while keeping the overall prediction quality at a similar level.

fairness, meta-ref, spatial task, (13 more...)

arXiv.org Artificial Intelligence

2402.13379

Country:

North America > United States > Iowa (0.04)
North America > United States > Maryland (0.04)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Food & Agriculture > Agriculture (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Why It's Notoriously Difficult to Compare AI and Human Perception

#artificialintelligenceNov-3-2020, 17:35:44 GMT

Science fiction is becoming reality as increasingly intelligent machines are gradually emerging -- ones that not only specialize in things like chess, but that can also carry out higher-level reasoning, or even answer deep philosophical questions. For the past few decades, experts have been collectively bending their efforts toward the creation of such a human-like artificial intelligence, or a so-called "strong" or artificial general intelligence (AGI), which can learn to perform a wide range of tasks as easily as a human might. But while current AI development may take some inspiration from the neuroscience of the human brain, is it actually appropriate to compare the way AI processes information with the way humans do it? The answer to that question depends on how experiments are set up, and how AI models are structured and trained, according to new research from a team of German researchers from the University of Tübingen and other research institutes. The team's study suggests that because of the differences between the way AI and humans arrive at such decisions, any generalizations from such a comparison may not be completely reliable, especially if machines are used to automate critical tasks.

artificial intelligence, experiment, machine learning, (14 more...)

#artificialintelligence

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.27)

Genre: Research Report (0.55)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.37)

Technology: