Visual commonsense reasoning task aims at leading the research field into solving cognition-level reasoning with the ability to predict correct answers and meanwhile providing convincing reasoning paths, resulting in three sub-tasks i.e., Q- A, QA- R and Q- AR. It poses great challenges over the proper semantic alignment between vision and linguistic domains and knowledge reasoning to generate persuasive reasoning paths. Existing works either resort to a powerful end-to-end network that cannot produce interpretable reasoning paths or solely explore intra-relationship of visual objects (homogeneous graph) while ignoring the cross-domain semantic alignment among visual concepts and linguistic words. In this paper, we propose a new Heterogeneous Graph Learning (HGL) framework for seamlessly integrating the intra-graph and inter-graph reasoning in order to bridge the vision and language domain. Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.
The "Naive Physics Manifesto" of Pat Hayes (1978) proposes a large-scale project to develop a formal theory encompassing the entire knowledge of physics of naive reasoners, expressed in a declarative symbolic form. The theory is organized in clusters of closely interconnected concepts and axioms. More recent work on the representation of commonsense physical knowledge has followed a somewhat different methodology. The goal has been to develop a competence theory powerful enough to justify commonsense physical inferences, and the research is organized in microworlds, each microworld covering a small range of physical phenomena. In this article, I compare the advantages and disadvantages of the two approaches.
The Winograd Schema Challenge has recently been proposed as an alternative to the Turing test. A Winograd Schema consists of a sentence and question pair such that the answer to the question depends on the resolution of a definite pronoun in the sentence. The answer is fairly intuitive for humans but is difficult for machines because it requires commonsense knowledge about words or concepts in the sentence. In this paper we propose a novel technique which semantically parses the text, hunts for the needed commonsense knowledge and uses that knowledge to answer the given question.
This paper presents a semantically grounded method for extracting commonsense knowledge. First, commonsense rules are identified, e.g., one cannot see imaginary objects. Second, those rules are combined with a basic semantic representation in order to infer commonsense knowledge facts, e.g. one cannot see a flying carpet. Further combinations of semantic relations with inferred commonsense facts are proposed and analyzed. Results show that this novel method is able to extract thousands of commonsense facts with little human interaction and high accuracy.
We believe that the flexibility and robustness of common sense reasoning comes from analogical reasoning, learning, and generalization operating over massive amounts of experience. Million-fact knowledge bases are a good starting point, but are likely to be orders of magnitude smaller, in terms of ground facts, than will be needed to achieve human-like common sense reasoning. This paper describes the FIRE reasoning engine which we have built to experiment with this approach. We discuss its knowledge base organization, including coarse-coding via mentions and a persistent TMS to achieve efficient retrieval while respecting the logical environment formed by contexts and their relationships in the KB. We describe its stratified reasoning organization, which supports both reflexive reasoning (Ask, Query) and deliberative reasoning (Solve, HTN planner). Analogical reasoning, learning, and generalization are supported as part of reflexive reasoning. To show the utility of these ideas, we describe how they are used in the Companion cognitive architecture, which has been used in a variety of reasoning and learning experiments.