AITopics

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)

Neural Information Processing SystemsApr-6-2023, 17:21:18 GMT

Effects of Spatial and Temporal Contiguity on the Acquisition of Spatial Information

information, spatial and temporal contiguity, spatial information, (3 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)

Neural Information Processing SystemsApr-6-2023, 17:07:00 GMT

A New Model of Spatial Representation in Multimodal Brain Areas

Most models of spatial representations in the cortex assume cells with limited receptive fields that are defined in a particular egocen(cid:173) tric frame of reference. However, cells outside of primary sensory cortex are either gain modulated by postural input or partially shifting. We show that solving classical spatial tasks, like sen(cid:173) sory prediction, multi-sensory integration, sensory-motor transfor(cid:173) mation and motor control requires more complicated intermediate representations that are not invariant in one frame of reference. We present an iterative basis function map that performs these spatial tasks optimally with gain modulated and partially shifting units, and tests it against neurophysiological and neuropsycholog(cid:173) ical data. In order to perform an action directed toward an object, it is necessary to have a representation of its spatial location.

multimodal brain area, new model, spatial representation, (3 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.63)

Neural Information Processing SystemsApr-6-2023, 14:08:36 GMT

Efficient Bregman Range Search

We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe the first algorithm for range search for an arbitrary Bregman divergence.

algorithm, bregman divergence, efficient bregman range search

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.84)
Information Technology > Data Science > Data Mining (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.45)

Neural Information Processing SystemsApr-6-2023, 13:16:56 GMT

Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

There has been a recent push in extraction of 3D spatial layout of scenes. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraints of the physical world. We show that augmenting current structured prediction techniques with volumetric reasoning significantly improves the performance of the state-of-the-art.

object and surface, spatial layout, volumetric reasoning

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.81)

Neural Information Processing SystemsApr-6-2023, 12:13:30 GMT

Persistent Homology for Learning Densities with Bounded Support

We present a novel method for learning densities with bounded support which enables us to incorporate hard' topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of Persistent Homology can be combined with kernel based methods from Machine Learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and -- by incorporating Persistent Homology techniques in our approach -- we are able to encode algebraic-topological constraints which are not addressed in current state-of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world data-set by learning a motion model for a racecar. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car.

bounded support, learning density, persistent homology, (1 more...)

Industry: Leisure & Entertainment > Sports > Motorsports (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.65)

arXiv.org Artificial IntelligenceMar-31-2023

Grounding Object Relations in Language-Conditioned Robotic Manipulation with Semantic-Spatial Reasoning

Luo, Qian, Li, Yunfei, Wu, Yi

Grounded understanding of natural language in physical scenes can greatly benefit robots that follow human instructions. In object manipulation scenarios, existing end-to-end models are proficient at understanding semantic concepts, but typically cannot handle complex instructions involving spatial relations among multiple objects. which require both reasoning object-level spatial relations and learning precise pixel-level manipulation affordances. We take an initial step to this challenge with a decoupled two-stage solution. In the first stage, we propose an object-centric semantic-spatial reasoner to select which objects are relevant for the language instructed task. The segmentation of selected objects are then fused as additional input to the affordance learning stage. Simply incorporating the inductive bias of relevant objects to a vision-language affordance learning agent can effectively boost its performance in a custom testbed designed for object manipulation with spatial-related language instructions.

artificial intelligence, language instruction, spatial reasoning, (13 more...)

2303.17919

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Greater London > London (0.05)
Asia > China > Shaanxi Province > Xi'an (0.04)
(9 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.54)

arXiv.org Artificial IntelligenceMar-30-2023

3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification

Zhang, Jiazhao, Dai, Liu, Meng, Fanpeng, Fan, Qingnan, Chen, Xuelin, Xu, Kai, Wang, He

Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation. Through extensive experiments, we show that this framework can dramatically improve the performance in ObjectNav through learning from 3D scene representation. Our framework achieves the best performance among all modular-based methods on the Matterport3D and Gibson datasets, while requiring (up to 30x) less computational cost for training.

artificial intelligence, machine learning, spatial reasoning, (17 more...)

2212.00338

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)

arXiv.org Artificial IntelligenceMar-30-2023

ProContEXT: Exploring Progressive Context Transformer for Tracking

Lan, Jin-Peng, Cheng, Zhi-Qi, He, Jun-Yan, Li, Chenyang, Luo, Bin, Bao, Xu, Xiang, Wangmeng, Geng, Yifeng, Xie, Xuansong

Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames. To this end, we revamped the tracking framework with Progressive Context Encoding Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories. Specifically, ProContEXT leverages a context-aware self-attention module to encode the spatial and temporal context, refining and updating the multi-scale static and dynamic templates to progressively perform accurately tracking. It explores the complementary between spatial and temporal context, raising a new pathway to multi-context modeling for transformer-based trackers. In addition, ProContEXT revised the token pruning technique to reduce computational complexity. Extensive experiments on popular benchmark datasets such as GOT-10k and TrackingNet demonstrate that the proposed ProContEXT achieves state-of-the-art performance.

artificial intelligence, machine learning, template, (17 more...)

2210.15511

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.30)

arXiv.org Artificial IntelligenceMar-30-2023

Making Vision Transformers Efficient from A Token Sparsification View

Chang, Shuning, Wang, Pichao, Lin, Ming, Wang, Fan, Zhang, David Junhao, Jin, Rong, Shou, Mike Zheng

The quadratic computational complexity to the number of tokens limits the practical applications of Vision Transformers (ViTs). Several works propose to prune redundant tokens to achieve efficient ViTs. However, these methods generally suffer from (i) dramatic accuracy drops, (ii) application difficulty in the local vision transformer, and (iii) non-general-purpose networks for downstream tasks. In this work, we propose a novel Semantic Token ViT (STViT), for efficient global and local vision transformers, which can also be revised to serve as backbone for downstream tasks. The semantic tokens represent cluster centers, and they are initialized by pooling image tokens in space and recovered by attention, which can adaptively represent global or local semantic information. Due to the cluster properties, a few semantic tokens can attain the same effect as vast image tokens, for both global and local vision transformers. For instance, only 16 semantic tokens on DeiT-(Tiny,Small,Base) can achieve the same accuracy with more than 100% inference speed improvement and nearly 60% FLOPs reduction; on Swin-(Tiny,Small,Base), we can employ 16 semantic tokens in each window to further speed it up by around 20% with slight accuracy increase. Besides great success in image classification, we also extend our method to video recognition. In addition, we design a STViT-R(ecover) network to restore the detailed spatial information based on the STViT, making it work for downstream tasks, which is powerless for previous token sparsification methods. Experiments demonstrate that our method can achieve competitive results compared to the original networks in object detection and instance segmentation, with over 30% FLOPs reduction for backbone. Code is available at http://github.com/changsn/STViT-R

artificial intelligence, natural language, text processing, (20 more...)

2303.08685

Country:

Asia > Singapore (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.36)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)