Qualitative Reasoning
Qualitative Event Perception: Leveraging Spatiotemporal Episodic Memory for Learning Combat in a Strategy Game
Hancock, Will, Forbus, Kenneth D.
Event perception refers to people's ability to carve up continuous experience into meaningful discrete events. We speak of finishing our morning coffee, mowing the lawn, leaving work, etc. as singular occurrences that are localized in time and space. In this work, we analyze how spatiotemporal representations can be used to automatically segment continuous experience into structured episodes, and how these descriptions can be used for analogical learning. These representations are based on Hayes' notion of histories and build upon existing work on qualitative episodic memory. Our agent automatically generates event descriptions of military battles in a strategy game and improves its gameplay by learning from this experience. Episodes are segmented based on changing properties in the world and we show evidence that they facilitate learning because they capture event descriptions at a useful spatiotemporal grain size. This is evaluated through our agent's performance in the game. We also show empirical evidence that the perception of spatial extent of episodes affects both their temporal duration as well as the number of overall cases generated.
Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding
Forbus, Kenneth D., Chen, Kezhen, Xu, Wangcheng, Usher, Madeline
One of the purposes of perception is to bridge between sensors and conceptual understanding. Marr's Primal Sketch combined initial edge-finding with multiple downstream processes to capture aspects of visual perception such as grouping and stereopsis. Given the progress made in multiple areas of AI since then, we have developed a new framework inspired by Marr's work, the Hybrid Primal Sketch, which combines computer vision components into an ensemble to produce sketch-like entities which are then further processed by CogSketch, our model of high-level human vision, to produce both more detailed shape representations and scene representations which can be used for data-efficient learning via analogical generalization. This paper describes our theoretical framework, summarizes several previous experiments, and outlines a new experiment in progress on diagram understanding.
TLEX: An Efficient Method for Extracting Exact Timelines from TimeML Temporal Graphs
Ocal, Mustafa, Xie, Ning, Finlayson, Mark
A timeline provides a total ordering of events and times, and is useful for a number of natural language understanding tasks. However, qualitative temporal graphs that can be derived directly from text -- such as TimeML annotations -- usually explicitly reveal only partial orderings of events and times. In this work, we apply prior work on solving point algebra problems to the task of extracting timelines from TimeML annotated texts, and develop an exact, end-to-end solution which we call TLEX (TimeLine EXtraction). TLEX transforms TimeML annotations into a collection of timelines arranged in a trunk-and-branch structure. Like what has been done in prior work, TLEX checks the consistency of the temporal graph and solves it; however, it adds two novel functionalities. First, it identifies specific relations involved in an inconsistency (which could then be manually corrected) and, second, TLEX performs a novel identification of sections of the timelines that have indeterminate order, information critical for downstream tasks such as aligning events from different timelines. We provide detailed descriptions and analysis of the algorithmic components in TLEX, and conduct experimental evaluations by applying TLEX to 385 TimeML annotated texts from four corpora. We show that 123 of the texts are inconsistent, 181 of them have more than one ``real world'' or main timeline, and there are 2,541 indeterminate sections across all four corpora. A sampling evaluation showed that TLEX is 98--100% accurate with 95% confidence along five dimensions: the ordering of time-points, the number of main timelines, the placement of time-points on main versus subordinate timelines, the connecting point of branch timelines, and the location of the indeterminate sections. We provide a reference implementation of TLEX, the extracted timelines for all texts, and the manual corrections of the inconsistent texts.
Physically Grounded Vision-Language Models for Robotic Manipulation
Gao, Jensen, Sarkar, Bidipta, Xia, Fei, Xiao, Ted, Wu, Jiajun, Ichter, Brian, Majumdar, Anirudha, Sadigh, Dorsa
Recent advances in vision-language models (VLMs) have led to improved performance on tasks such as visual question answering and image captioning. Consequently, these models are now well-positioned to reason about the physical world, particularly within domains such as robotic manipulation. However, current VLMs are limited in their understanding of the physical concepts (e.g., material, fragility) of common objects, which restricts their usefulness for robotic manipulation tasks that involve interaction and physical reasoning about such objects. To address this limitation, we propose PhysObjects, an object-centric dataset of 39.6K crowd-sourced and 417K automated physical concept annotations of common household objects. We demonstrate that fine-tuning a VLM on PhysObjects improves its understanding of physical object concepts, including generalization to held-out concepts, by capturing human priors of these concepts from visual appearance. We incorporate this physically-grounded VLM in an interactive framework with a large language model-based robotic planner, and show improved planning performance on tasks that require reasoning about physical object concepts, compared to baselines that do not leverage physically-grounded VLMs. We additionally illustrate the benefits of our physically-grounded VLM on a real robot, where it improves task success rates. We release our dataset and provide further details and visualizations of our results at https://iliad.stanford.edu/pg-vlm/.
Improved Algorithms for Allen's Interval Algebra by Dynamic Programming with Sublinear Partitioning
Eriksson, Leif, Lagerkvist, Victor
Allen's interval algebra is one of the most well-known calculi in qualitative temporal reasoning with numerous applications in artificial intelligence. Recently, there has been a surge of improvements in the fine-grained complexity of NP-hard reasoning tasks, improving the running time from the naive $2^{O(n^2)}$ to $O^*((1.0615n)^{n})$, with even faster algorithms for unit intervals a bounded number of overlapping intervals (the $O^*(\cdot)$ notation suppresses polynomial factors). Despite these improvements the best known lower bound is still only $2^{o(n)}$ (under the exponential-time hypothesis) and major improvements in either direction seemingly require fundamental advances in computational complexity. In this paper we propose a novel framework for solving NP-hard qualitative reasoning problems which we refer to as dynamic programming with sublinear partitioning. Using this technique we obtain a major improvement of $O^*((\frac{cn}{\log{n}})^{n})$ for Allen's interval algebra. To demonstrate that the technique is applicable to more domains we apply it to a problem in qualitative spatial reasoning, the cardinal direction point algebra, and solve it in $O^*((\frac{cn}{\log{n}})^{2n/3})$ time. Hence, not only do we significantly advance the state-of-the-art for NP-hard qualitative reasoning problems, but obtain a novel algorithmic technique that is likely applicable to many problems where $2^{O(n)}$ time algorithms are unlikely.
Qualitative structure from motion
Exact structure from motion is an ill-posed computation and therefore very sensitive to noise. In this work I describe how a qualitative shape representation, based on the sign of the Gaussian curvature, can be com(cid:173) puted directly from motion disparities, without the computation of an exact depth map or the directions of surface normals. I show that humans can judge the curvature sense of three points undergoing 3D motion from two, three and four views with success rate significantly above chance. A simple RBF net has been trained to perform the same task.
Probabilistic Qualitative Localization and Mapping
Simultaneous localization and mapping (SLAM) are essential in numerous robotics applications, such as autonomous navigation. Traditional SLAM approaches infer the metric state of the robot along with a metric map of the environment. While existing algorithms exhibit good results, they are still sensitive to measurement noise, sensor quality, and data association and are still computationally expensive. Alternatively, some navigation and mapping missions can be achieved using only qualitative geometric information, an approach known as qualitative spatial reasoning (QSR). We contribute a novel probabilistic qualitative localization and mapping approach in this work. We infer both the qualitative map and the qualitative state of the camera poses (localization). For the first time, we also incorporate qualitative probabilistic constraints between camera poses (motion model), improving computation time and performance. Furthermore, we take advantage of qualitative inference properties to achieve very fast approximated algorithms with good performance. In addition, we show how to propagate probabilistic information between nodes in the qualitative map, which improves estimation performance and enables inference of unseen map nodes - an important building block for qualitative active planning. We also conduct a study that shows how well we can estimate unseen nodes. Our method particularly appeals to scenarios with few salient landmarks and low-quality sensors. We evaluate our approach in simulation and on a real-world dataset and show its superior performance and low complexity compared to the state-of-the-art. Our analysis also indicates good prospects for using qualitative navigation and planning in real-world scenarios.
A Multivariate Complexity Analysis of Qualitative Reasoning Problems
Eriksson, Leif, Lagerkvist, Victor
Qualitative reasoning is an important subfield of artificial intelligence where one describes relationships with qualitative, rather than numerical, relations. Many such reasoning tasks, e.g., Allen's interval algebra, can be solved in $2^{O(n \cdot \log n)}$ time, but single-exponential running times $2^{O(n)}$ are currently far out of reach. In this paper we consider single-exponential algorithms via a multivariate analysis consisting of a fine-grained parameter $n$ (e.g., the number of variables) and a coarse-grained parameter $k$ expected to be relatively small. We introduce the classes FPE and XE of problems solvable in $f(k) \cdot 2^{O(n)}$, respectively $f(k)^n$, time, and prove several fundamental properties of these classes. We proceed by studying temporal reasoning problems and (1) show that the Partially Ordered Time problem of effective width $k$ is solvable in $16^{kn}$ time and is thus included in XE, and (2) that the network consistency problem for Allen's interval algebra with no interval overlapping with more than $k$ others is solvable in $(2nk)^{2k} \cdot 2^{n}$ time and is included in FPE. Our multivariate approach is in no way limited to these to specific problems and may be a generally useful approach for obtaining single-exponential algorithms.
Schockaert
We introduce a framework for qualitative reasoning about directions in high-dimensional spaces, called EER, where our main motivation is to develop a form of commonsense reasoning about semantic spaces. The proposed framework is, however, more general; we show how qualitative spatial reasoning about points with several existing calculi can be reduced to the realisability problem for EER (or REER for short), including LR and calculi for reasoning about betweenness, collinearity and parallelism. Finally, we propose an efficient but incomplete inference method, and show its effectiveness for reasoning with EER as well as reasoning with some of the aforementioned calculi.
A Generalised Approach for Encoding and Reasoning with Qualitative Theories in Answer Set Programming
Baryannis, George, Tachmazidis, Ilias, Batsakis, Sotiris, Antoniou, Grigoris, Alviano, Mario, Papadakis, Emmanuel
Qualitative reasoning involves expressing and deriving knowledge based on qualitative terms such as natural language expressions, rather than strict mathematical quantities. Well over 40 qualitative calculi have been proposed so far, mostly in the spatial and temporal domains, with several practical applications such as naval traffic monitoring, warehouse process optimisation and robot manipulation. Even if a number of specialised qualitative reasoning tools have been developed so far, an important barrier to the wider adoption of these tools is that only qualitative reasoning is supported natively, when real-world problems most often require a combination of qualitative and other forms of reasoning. In this work, we propose to overcome this barrier by using ASP as a unifying formalism to tackle problems that require qualitative reasoning in addition to non-qualitative reasoning. A family of ASP encodings is proposed which can handle any qualitative calculus with binary relations. These encodings are experimentally evaluated using a real-world dataset based on a case study of determining optimal coverage of telecommunication antennas, and compared with the performance of two well-known dedicated reasoners. Experimental results show that the proposed encodings outperform one of the two reasoners, but fall behind the other, an acceptable trade-off given the added benefits of handling any type of reasoning as well as the interpretability of logic programs. This paper is under consideration for acceptance in TPLP.