semantic parsing and reasoning
Explainable Image Understanding Using Vision and Reasoning
Aditya, Somak (Arizona State University)
Image Understanding is fundamental to intelligent agents.Researchers have explored Caption Generation and VisualQuestion Answering as independent aspects of Image Understanding (Johnson et al. 2015; Xiong, Merity, and Socher2016). Common to most of the successful approaches, are the learning of end-to-end signal mapping (image-to-caption, image and question to answer). The accuracy is impressive. It is also important to explain a decision to end-user(justify the results, and rectify based on feedback). Very recently, there has been some focus (Hendricks et al. 2016;Liu et al. ) on explaining some aspects of the learning systems. In my research, I look towards building explainableImage Understanding systems that can be used to generate captions and answer questions. Humans learn both from examples (learning) and by reading (knowledge). Inspired by such an intuition, researchers have constructed Knowledge-Bases that encode (probabilistic) commonsense and background knowledge. In this work, we look towards efficiently using this probabilistic knowledge on top of machine learning capabilities, to rectify noise in visual detections and generate captions or answers to posed questions.
Visual Commonsense for Scene Understanding Using Perception, Semantic Parsing and Reasoning
Aditya, Somak (Arizona State University) | Yang, Yezhou (University of Maryland, College Park) | Baral, Chitta (Arizona State University) | Fermuller, Cornelia (Associate Research Scientist, University of Maryland, College Park) | Aloimonos, Yiannis (University of Maryland, College Park)
In this paper we explore the use of visual common-sense knowledge and other kinds of knowledge (such as domain knowledge, background knowledge, linguistic knowledge) for scene understanding. In particular, we combine visual processing with techniques from natural language understanding (especially semantic parsing), common-sense reasoning and knowledge representation and reasoning to improve visual perception to reason about finer aspects of activities.