AITopics | Spatial Reasoning

Collaborating Authors

Spatial Reasoning

News Overviews Instructional Materials AI-Alerts Classics

Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings

arXiv.org Artificial IntelligenceMar-30-2023

Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks. As humans tend to use objects in many different ways depending on the scene and the objects' availability, learning object affordances in everyday-life scenarios is a challenging task, particularly in the presence of an open set of interactions and objects. We address the problem of affordance categorization for class-agnostic objects with an open set of interactions; we achieve this by learning similarities between object interactions in an unsupervised way and thus inducing clusters of object affordances. A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs (AGs), which abstract from the continuous representation of spatio-temporal interactions in RGB-D videos. These AGs are clustered to obtain groups of objects with similar affordances. Our experiments in a real-world scenario demonstrate that our method learns to create object affordance clusters with a high V-measure even in cluttered scenes. The proposed approach handles object occlusions by capturing effectively possible interactions and without imposing any object or scene constraints.

affordance, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.05989

Country:

Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
North America > United States (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Add feedback

Mask-Free Video Instance Segmentation

Ke, Lei, Danelljan, Martin, Ding, Henghui, Tai, Yu-Wing, Tang, Chi-Keung, Yu, Fisher

arXiv.org Artificial IntelligenceMar-28-2023

The recent advancement in Video Instance Segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this work, we aim to remove the mask-annotation requirement. We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state. We leverage the rich temporal mask consistency constraints in videos by introducing the Temporal KNN-patch Loss (TK-Loss), providing strong mask supervision without any labels. Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection. A consistency loss is then enforced on the found matches. Our mask-free objective is simple to implement, has no trainable parameters, is computationally efficient, yet outperforms baselines employing, e.g., state-of-the-art optical flow to enforce temporal mask consistency. We validate MaskFreeVIS on the YouTube-VIS 2019/2021, OVIS and BDD100K MOTS benchmarks. The results clearly demonstrate the efficacy of our method by drastically narrowing the gap between fully and weakly-supervised VIS performance. Our code and trained models are available at https://github.com/SysCV/MaskFreeVis.

artificial intelligence, machine learning, segmentation, (19 more...)

arXiv.org Artificial Intelligence

2303.15904

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Forecasting localized weather impacts on vegetation as seen from space with meteo-guided video prediction

Benson, Vitus, Requena-Mesa, Christian, Robin, Claire, Alonso, Lazaro, Cortés, José, Gao, Zhihan, Linscheid, Nora, Weynants, Mélanie, Reichstein, Markus

arXiv.org Artificial IntelligenceMar-28-2023

We present a novel approach for modeling vegetation response to weather in Europe as measured by the Sentinel 2 satellite. Existing satellite imagery forecasting approaches focus on photorealistic quality of the multispectral images, while derived vegetation dynamics have not yet received as much attention. We leverage both spatial and temporal context by extending state-of-the-art video prediction methods with weather guidance. We extend the EarthNet2021 dataset to be suitable for vegetation modeling by introducing a learned cloud mask and an appropriate evaluation scheme. Qualitative and quantitative experiments demonstrate superior performance of our approach over a wide variety of baseline methods, including leading approaches to satellite imagery forecasting. Additionally, we show how our modeled vegetation dynamics can be leveraged in a downstream task: inferring gross primary productivity for carbon monitoring. To the best of our knowledge, this work presents the first models for continental-scale vegetation modeling at fine resolution able to capture anomalies beyond the seasonal cycle, thereby paving the way for predictive assessments of vegetation status.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2303.16198

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Romania > Nord-Vest Development Region > Bihor County > Oradea (0.04)
Europe > Poland (0.04)
(6 more...)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.56)
Food & Agriculture > Agriculture (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)
Information Technology > Modeling & Simulation (0.89)

Add feedback

Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution

Lu, Yunfan, Wang, Zipeng, Liu, Minjie, Wang, Hongjian, Wang, Lin

arXiv.org Artificial IntelligenceMar-28-2023

Event cameras sense the intensity changes asynchronously and produce event streams with high dynamic range and low latency. This has inspired research endeavors utilizing events to guide the challenging video superresolution (VSR) task. In this paper, we make the first attempt to address a novel problem of achieving VSR at random scales by taking advantages of the high temporal resolution property of events. This is hampered by the difficulties of representing the spatial-temporal information of events when guiding VSR. To this end, we propose a novel framework that incorporates the spatial-temporal interpolation of events to VSR in a unified framework. Our key idea is to learn implicit neural representations from queried spatial-temporal coordinates and features from both RGB frames and events. Our method contains three parts. Specifically, the Spatial-Temporal Fusion (STF) module first learns the 3D features from events and RGB frames. Then, the Temporal Filter (TF) module unlocks more explicit motion information from the events near the queried timestamp and generates the 2D features. Lastly, the SpatialTemporal Implicit Representation (STIR) module recovers the SR frame in arbitrary resolutions from the outputs of these two modules. In addition, we collect a real-world dataset with spatially aligned events and RGB frames. Extensive experiments show that our method significantly surpasses the prior-arts and achieves VSR with random scales, e.g., 6.5. Code and dataset are available at https: //vlis2022.github.io/cvpr23/egvsr.

artificial intelligence, machine learning, spatial reasoning, (19 more...)

arXiv.org Artificial Intelligence

2303.13767

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Philosophical Foundations of GeoAI: Exploring Sustainability, Diversity, and Bias in GeoAI and Spatial Data Science

Janowicz, Krzysztof

arXiv.org Artificial IntelligenceMar-27-2023

This chapter presents some of the fundamental assumptions and principles that could form the philosophical foundation of GeoAI and spatial data science. Instead of reviewing the well-established characteristics of spatial data (analysis), including interaction, neighborhoods, and autocorrelation, the chapter highlights themes such as sustainability, bias in training data, diversity in schema knowledge, and the (potential lack of) neutrality of GeoAI systems from a unifying ethical perspective. Reflecting on our profession's ethical implications will assist us in conducting potentially disruptive research more responsibly, identifying pitfalls in designing, training, and deploying GeoAI-based systems, and developing a shared understanding of the benefits but also potential dangers of artificial intelligence and machine learning research across academic fields, all while sharing our unique (geo)spatial perspective with others.

artificial intelligence, machine learning, spatial reasoning, (18 more...)

arXiv.org Artificial Intelligence

2304.06508

Country:

Europe > Austria > Vienna (0.14)
Europe > Ukraine (0.04)
Europe > Russia (0.04)
(9 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.93)
Energy (0.93)
Government (0.68)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Spatially-Aware Car-Sharing Demand Prediction

Mühlematter, Dominik J., Wiedemann, Nina, Xin, Yanan, Raubal, Martin

arXiv.org Artificial IntelligenceMar-25-2023

In recent years, car-sharing services have emerged as viable alternatives to private individual mobility, promising more sustainable and resource-efficient, but still comfortable transportation. Research on short-term prediction and optimization methods has improved operations and fleet control of car-sharing services; however, long-term projections and spatial analysis are sparse in the literature. We propose to analyze the average monthly demand in a station-based car-sharing service with spatially-aware learning algorithms that offer high predictive performance as well as interpretability. In particular, we compare the spatially-implicit Random Forest model with spatially-aware methods for predicting average monthly per-station demand. The study utilizes a rich set of socio-demographic, location-based (e.g., POIs), and car-sharing-specific features as input, extracted from a large proprietary car-sharing dataset and publicly available datasets. We show that the global Random Forest model with geo-coordinates as an input feature achieves the highest predictive performance with an R-squared score of 0.87, while local methods such as Geographically Weighted Regression perform almost on par and additionally yield exciting insights into the heterogeneous spatial distributions of factors influencing car-sharing behaviour. Additionally, our study offers effective as well as highly interpretable methods for diagnosing and planning the placement of car-sharing stations.

artificial intelligence, machine learning, prediction, (20 more...)

arXiv.org Artificial Intelligence

2303.14421

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Visual language maps for robot navigation – Google AI Blog

#artificialintelligenceMar-23-2023, 19:09:50 GMT

People are excellent navigators of the physical world, due in part to their remarkable ability to build cognitive maps that form the basis of spatial memory -- from localizing landmarks at varying ontological levels (like a book on a shelf in the living room) to determining whether a layout permits navigation from point A to point B. Building robots that are proficient at navigation requires an interconnected understanding of (a) vision and natural language (to associate landmarks or follow instructions), and (b) spatial reasoning (to connect a map representing an environment to the true spatial distribution of objects). While there have been many recent advances in training joint visual-language models on Internet-scale data, figuring out how to best connect them to a spatial representation of the physical world that can be used by robots remains an open research question. To explore this, we collaborated with researchers at the University of Freiburg and Nuremberg to develop Visual Language Maps (VLMaps), a map representation that directly fuses pre-trained visual-language embeddings into a 3D reconstruction of the environment. VLMaps, which is set to appear at ICRA 2023, is a simple approach that allows robots to (1) index visual landmarks in the map using natural language descriptions, (2) employ Code as Policies to navigate to spatial goals, such as "go in between the sofa and TV" or "move three meters to the right of the chair", and (3) generate open-vocabulary obstacle maps -- allowing multiple robots with different morphologies (mobile manipulators vs. drones, for example) to use the same VLMap for path planning. VLMaps can be used out-of-the-box without additional labeled data or model fine-tuning, and outperforms other zero-shot methods by over 17% on challenging object-goal and spatial-goal navigation tasks in Habitat and Matterport3D.

representation, robot, vlmap, (15 more...)

#artificialintelligence

Country:

Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.25)
Europe > Germany > Baden-Württemberg > Freiburg (0.25)

Genre: Research Report (0.77)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.56)
Information Technology > Artificial Intelligence > Games > Go (0.40)

Add feedback

Visual Spatial Reasoning

Liu, Fangyu, Emerson, Guy, Collier, Nigel

arXiv.org Artificial IntelligenceMar-22-2023

Spatial relations are a basic part of human cognition. However, they are expressed in natural language in a variety of ways, and previous work has suggested that current vision-and-language models (VLMs) struggle to capture relational information. In this paper, we present Visual Spatial Reasoning (VSR), a dataset containing more than 10k natural text-image pairs with 66 types of spatial relations in English (such as: under, in front of, and facing). While using a seemingly simple annotation format, we show how the dataset includes challenging linguistic phenomena, such as varying reference frames. We demonstrate a large gap between human and model performance: the human ceiling is above 95%, while state-of-the-art models only achieve around 70%. We observe that VLMs' by-relation performances have little correlation with the number of training examples and the tested models are in general incapable of recognising relations concerning the orientations of objects.

machine learning, natural language, relation, (19 more...)

arXiv.org Artificial Intelligence

2205.00363

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(14 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)

Add feedback

Urban Regional Function Guided Traffic Flow Prediction

Wang, Kuo, Liu, Lingbo, Liu, Yang, Li, Guanbin, Zhou, Fan, Lin, Liang

arXiv.org Artificial IntelligenceMar-19-2023

The prediction of traffic flow is a challenging yet crucial problem in spatial-temporal analysis, which has recently gained increasing interest. In addition to spatial-temporal correlations, the functionality of urban areas also plays a crucial role in traffic flow prediction. However, the exploration of regional functional attributes mainly focuses on adding additional topological structures, ignoring the influence of functional attributes on regional traffic patterns. Different from the existing works, we propose a novel module named POI-MetaBlock, which utilizes the functionality of each region (represented by Point of Interest distribution) as metadata to further mine different traffic characteristics in areas with different functions. Specifically, the proposed POI-MetaBlock employs a self-attention architecture and incorporates POI and time information to generate dynamic attention parameters for each region, which enables the model to fit different traffic patterns of various areas at different times. Furthermore, our lightweight POI-MetaBlock can be easily integrated into conventional traffic flow prediction models. Extensive experiments demonstrate that our module significantly improves the performance of traffic flow prediction and outperforms state-of-the-art methods that use metadata.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

2303.09789

Country:

Asia > China > Sichuan Province > Chengdu (0.05)
Asia > China > Shaanxi Province > Xi'an (0.05)
Asia > China > Hong Kong (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Transportation (1.00)
Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.75)

Add feedback

Understanding why shooters shoot -- An AI-powered engine for basketball performance profiling

Pascual, Alejandro Rodriguez, Mehta, Ishan, Khan, Muhammad, Rodriz, Frank, Yu, Rose

arXiv.org Artificial IntelligenceMar-16-2023

Understanding player shooting profiles is an essential part of basketball analysis: knowing where certain opposing players like to shoot from can help coaches neutralize offensive gameplans from their opponents; understanding where their players are most comfortable can lead them to developing more effective offensive strategies. An automatic tool that can provide these performance profiles in a timely manner can become invaluable for coaches to maximize both the effectiveness of their game plan as well as the time dedicated to practice and other related activities. Additionally, basketball is dictated by many variables, such as playstyle and game dynamics, that can change the flow of the game and, by extension, player performance profiles. It is crucial that the performance profiles can reflect the diverse playstyles, as well as the fast-changing dynamics of the game. We present a tool that can visualize player performance profiles in a timely manner while taking into account factors such as play-style and game dynamics. Our approach generates interpretable heatmaps that allow us to identify and analyze how non-spatial factors, such as game dynamics or playstyle, affect player performance profiles.

artificial intelligence, machine learning, spatial reasoning, (16 more...)

arXiv.org Artificial Intelligence

2303.09715

Country:

North America > United States > California > San Diego County > San Diego (0.05)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports > Basketball (0.70)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.69)

Add feedback