A sizable fraction of current research into human visuo-spatial knowledge processing explicitly or implicitly suggests a spatial processing of certain knowledge types and a visual processing of others. Similarly, many formal and technical approaches for representing and processing visuo-spatial information in artificial intelligence, in computational cognitive modeling, or in knowledge representation and reasoning explicitly or implicitly treat visual and spatial information as belonging to separate types. While there exists good evidence for some differences in mental processing of different visuo-spatial knowledge types, there is much less reason to maintain the currently ascribed separation between the visual and the spatial. We provide arguments on why strict dichotomies seem unwarranted with regard to descriptions of human mental spatial reasoning and disadvantageous for the formal and technical approaches. We build upon a synopsis of psychological evidence for the existence of multiple knowledge type specific representations in human visuo-spatial reasoning and discuss the notion of scalable representation structures. In absence of proof to the contrary, it seems better practice to assume that (a) many of the type differences attributed to visuo-spatial knowledge processing are gradual rather than qualitative in nature, and that (b) tasks involving visuo-spatial knowledge of several types are often mentally processed through dynamic associations of structures for processing basal knowledge types. The paper calls for more investigations of human reasoning in visuo-spatial tasks in which knowledge types dynamically change during reasoning. It outlines a research framework for systematically investigating different basal visuo-spatial knowledge types and their combinations with regard to cognitive and computational plausibility. Current research is related to the framework, including research on Casimir, our computational cognitive architecture for reasoning with visuo-spatial knowledge. We argue that a more systematic course of research along the lines of the proposed framework will not only lead to more appropriate descriptions of human cognition (regarding visuo-spatial knowledge processing) but may also spawn more integrated and versatile formal and technical approaches for dealing with visuo-spatial information.
This paper explores the use of methods and mechanisms from case-based reasoning for spatial cognition. In particular, it discusses approaches for tasks where an agent has knowledge available in various formats, and needs to make a choice for the most suitable one. The idea is to view the agent's repository of previously solved spatial problems as a case base and to store information of the representation used to solve each problem along with the case. Similarity measures can then be implemented that allow for the comparison of a new spatial problem to previously solved problems. Knowledge of spatial representations used to solve a previous problem can then help an agent reasoning with spatial knowledge to choose a suitable representation, based on the problem structure. Through the technique of case-based reasoning, we explore a possible answer to the question of how a software agent may choose one of several available spatial representations to perform a processing task.
Human reasoning about spatial environments or spatial configurations is often based on spatio-analogical mental representations (mental images). Due to restrictions in processing and storage capacity in human working memory, mental images are constructed dynamically as highly problem-adequate representations to infer a wanted spatial result. In this contribution, I argue for the investigation of reasoning processes in mental images with respect to the integration of different aspects of spatial knowledge processing. From an AI perspective the properties of spatial knowledge processing in mental images point to a promising field of research in spatial and spatiotemporal reasoning. The following core features of knowledge processing in mental images are of special interest for spatial and spatiotemporal reasoning.
In this paper, we explore question answering based on spatial knowledge. We first consider a broad general-purpose axiomatic theory covering different aspects of qualitative spatial representation such as topology, orientation, distance, size, and shape. Since it can be expensive to build such a theory from scratch, we heuristically 81ice out a spatial subset of the Cyc knowledge base as a starting point for our work. We also explore a number of techniques to support efficient reasoning. The first is the RCC8 calculus, supported by the use of composition tables.
Spatial understanding is a fundamental problem with wide-reaching real-world applications. The representation of spatial knowledge is often modeled with spatial templates, i.e., regions of acceptability of two objects under an explicit spatial relationship (e.g., "on," "below," etc.). In contrast with prior work that restricts spatial templates to explicit spatial prepositions (e.g., "glass on table"), here we extend this concept to implicit spatial language, i.e., those relationships (generally actions) for which the spatial arrangement of the objects is only implicitly implied (e.g., "man riding horse"). In contrast with explicit relationships, predicting spatial arrangements from implicit spatial language requires significant common sense spatial understanding. Here, we introduce the task of predicting spatial templates for two objects under a relationship, which can be seen as a spatial question-answering task with a (2D) continuous output ("where is the man w.r.t. a horse when the man is walking the horse?"). We present two simple neural-based models that leverage annotated images and structured text to learn this task. The good performance of these models reveals that spatial locations are to a large extent predictable from implicit spatial language. Crucially, the models attain similar performance in a challenging generalized setting, where the object-relation-object combinations (e.g., "man walking dog") have never been seen before. Next, we go one step further by presenting the models with unseen objects (e.g., "dog"). In this scenario, we show that leveraging word embeddings enables the models to output accurate spatial predictions, proving that the models acquire solid common sense spatial knowledge allowing for such generalization.