lrta
LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering
Liang, Weixin, Niu, Feiyang, Reganti, Aishwarya, Thattai, Govind, Tur, Gokhan
The predominant approach to visual question answering (VQA) relies on encoding the image and question with a "black-box" neural encoder and decoding a single token as the answer like "yes" or "no". Despite this approach's strong quantitative results, it struggles to come up with intuitive, human-readable forms of justification for the prediction process. To address this insufficiency, we reformulate VQA as a full answer generation task, which requires the model to justify its predictions in natural language. We propose LRTA [Look, Read, Think, Answer], a transparent neural-symbolic reasoning framework for visual question answering that solves the problem step-by-step like humans and provides human-readable form of justification at each step. Specifically, LRTA learns to first convert an image into a scene graph and parse a question into multiple reasoning instructions. It then executes the reasoning instructions one at a time by traversing the scene graph using a recurrent neural-symbolic execution module. Finally, it generates a full answer to the given question with natural language justifications. Our experiments on GQA dataset show that LRTA outperforms the state-of-the-art model by a large margin (43.1% v.s. 28.0%) on the full answer generation task. We also create a perturbed GQA test set by removing linguistic cues (attributes and relations) in the questions for analyzing whether a model is having a smart guess with superficial data correlations. We show that LRTA makes a step towards truly understanding the question while the state-of-the-art model tends to learn superficial correlations from the training data.
Agent-Centered Search
In this article, I describe agent-centered search (also called real-time search or local search) and illustrate this planning paradigm with examples. Agent-centered search methods interleave planning and plan execution and restrict planning to the part of the domain around the current state of the agent, for example, the current location of a mobile robot or the current board position of a game. These methods can execute actions in the presence of time constraints and often have a small sum of planning and execution cost, both because they trade off planning and execution cost and because they allow agents to gather information early in nondeterministic domains, which reduces the amount of planning they have to perform for unencountered situations. These advantages become important as more intelligent systems are interfaced with the world and have to operate autonomously in complex environments. I researchers have studied in detail offline planning methods that first determine sequential or conditional plans (including reactive plans) and then execute them in the world.
Fast Subgoaling for Pathfinding via Real-Time Search
Hernandez, Carlos (Universidad Católica de la Santísima Concepción) | Baier, Jorge A. (Pontificia Universidad Católica de Chile)
Real-time heuristic search is a standard approach to pathfind- ing when agents are required to make decisions in a bounded, very short period of time. An assumption usually made in the development and evaluation of real-time algorithms is that the environment is unknown. Nevertheless, in many interesting applications such as pathfinding for automnomous characters in video games, the environment is known in advance. Recent real-time search algorithms such as D LRTA* and kNN LRTA* exploit knowledge about the environment while pathfinding under real-time constraints. Key to those algorithms is the computation of subgoals in a preprocessing step. Subgoals are subsequently used in the online planning phase to obtain high-quality solutions. Preprocessing in those algorithms, however, requires significant computation. In this paper we propose a novel preprocessing algorithm that generates subgoals using a series of backward search episodes carried out from potential goals. The result of a single backward search episode is a tree of subgoals that we then use while planning online. We show the advantages of our approach over state-of-the-art algorithms by carrying out experiments on standard real-time search benchmarks.