If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Visual commonsense reasoning task aims at leading the research field into solving cognition-level reasoning with the ability to predict correct answers and meanwhile providing convincing reasoning paths, resulting in three sub-tasks i.e., Q- A, QA- R and Q- AR. It poses great challenges over the proper semantic alignment between vision and linguistic domains and knowledge reasoning to generate persuasive reasoning paths. Existing works either resort to a powerful end-to-end network that cannot produce interpretable reasoning paths or solely explore intra-relationship of visual objects (homogeneous graph) while ignoring the cross-domain semantic alignment among visual concepts and linguistic words. In this paper, we propose a new Heterogeneous Graph Learning (HGL) framework for seamlessly integrating the intra-graph and inter-graph reasoning in order to bridge the vision and language domain. Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.
Artificial intelligence programs are extremely good at finding subtle patterns in enormous amounts of data, but don't understand the meaning of anything. Whether you are searching the Internet on Google, browsing your news feed on Facebook, or finding the quickest route on a traffic app like Waze, an algorithm is at the root of it. Algorithms have permeated our daily lives; they help to simplify, distill, process, and provide insights from massive amounts of data. According to Ernest Davis, a professor of computer science at New York University's Courant Institute of Mathematical Sciences whose research centers on the automation of common-sense reasoning, the technologies that currently exist for artificial intelligence (AI) programs are extremely good at finding subtle patterns in enormous amounts of data. "One way or another," he says, "that is how they work."
The task of Visual Commonsense Reasoning is extremely challenging in the sense that the model has to not only be able to answer a question given an image, but also be able to learn to reason. The baselines introduced in this task are quite limiting because two networks are trained for predicting answers and rationales separately. Question and image is used as input to train answer prediction network while question, image and correct answer are used as input in the rationale prediction network. As rationale is conditioned on the correct answer, it is based on the assumption that we can solve Visual Question Answering task without any error - which is over ambitious. Moreover, such an approach makes both answer and rationale prediction two completely independent VQA tasks rendering cognition task meaningless. In this paper, we seek to address these issues by proposing an end-to-end trainable model which considers both answers and their reasons jointly. Specifically, we first predict the answer for the question and then use the chosen answer to predict the rationale. However, a trivial design of such a model becomes non-differentiable which makes it difficult to train. We solve this issue by proposing four approaches - softmax, gumbel-softmax, reinforcement learning based sampling and direct cross entropy against all pairs of answers and rationales. We demonstrate through experiments that our model performs competitively against current state-of-the-art. We conclude with an analysis of presented approaches and discuss avenues for further work.
Automatic KB completion for commonsense knowledge graphs (e.g., ATOMIC and ConceptNet) poses unique challenges compared to the much studied conventional knowledge bases (e.g., Freebase). Commonsense knowledge graphs use free-form text to represent nodes, resulting in orders of magnitude more nodes compared to conventional KBs (18x more nodes in ATOMIC compared to Freebase (FB15K-237)). Importantly, this implies significantly sparser graph structures - a major challenge for existing KB completion methods that assume densely connected graphs over a relatively smaller set of nodes. In this paper, we present novel KB completion models that can address these challenges by exploiting the structural and semantic context of nodes. Specifically, we investigate two key ideas: (1) learning from local graph structure, using graph convolutional networks and automatic graph densification and (2) transfer learning from pre-trained language models to knowledge graphs for enhanced contextual representation of knowledge. We describe our method to incorporate information from both these sources in a joint model and provide the first empirical results for KB completion on ATOMIC and evaluation with ranking metrics on ConceptNet. Our results demonstrate the effectiveness of language model representations in boosting link prediction performance and the advantages of learning from local graph structure (+1.5 points in MRR for ConceptNet) when training on subgraphs for computational efficiency. Further analysis on model predictions shines light on the types of commonsense knowledge that language models capture well.
What is meant by AI? What is the nature of intelligence? What is transhumanism and common sense reasoning? These are some of the questions which the book covers. The relationship between man and machine has fascinated people eversince the writing of Frankenstein, where we are warned about the unintended consequences of the use and development of technology. While scrutinizing AI, one profound question emerges as a natural result: what makes us truly human?
Recently, pretrained language models (e.g., BERT) have achieved great success on many downstream natural language understanding tasks and exhibit a certain level of commonsense reasoning ability. However, their performance on commonsense tasks is still far from that of humans. As a preliminary attempt, we propose a simple yet effective method to teach pretrained models with commonsense reasoning by leveraging the structured knowledge in ConceptNet, the largest commonsense knowledge base (KB). Specifically, the structured knowledge in KB allows us to construct various logical forms, and then generate multiple-choice questions requiring commonsense logical reasoning. Experimental results demonstrate that, when refined on these training examples, the pretrained models consistently improve their performance on tasks that require commonsense reasoning, especially in the few-shot learning setting. Besides, we also perform analysis to understand which logical relations are more relevant to commonsense reasoning.
To facilitate this, Rashkin et al. (2018) build the Event2Mind dataset and Sap et al. (2018) present the Atomic dataset, mainly focus on nine If-Then reasoning types to describe causes, effects, intents and participant characteristic about events. Together with these datasets, a simple RNN-based encoder-decoder framework is proposed to conduct the If-Then reasoning. However, there still remains two challenging problems. First, as illustrated in Figure 1, given an event "PersonX finds a new job", the plausible feeling of PersonX about that event could be multiple (such as "needy/stressed out" and "relieved/joyful"). Previous work showed that for the one-to-many problem, conventional RNN-based encoder-decoder models tend to generate generic responses, rather than meaningful and specific answers (Li et al., 2016; Serban et al., 2016). Second, as a commonsense reasoning problem, rich background knowledge is necessary for generating reasonable inferences. For example, as shown in Figure 1, the feeling of PersonX upon the event "PersonX finds a new job" could be multiple. However, after given a context " PersonX was fired", the plausible inferences would be narrowed down to " needy" or " stressed out ". To better solve these problems, we propose a context-aware variational autoencoder (CWV AE) together with a two-stage training procedure.
An intelligent system must be capable of performing automated reasoning as well as responding to the changing environment (for example, changing knowledge). To exhibit such an intelligent behavior, a machine needs to understand its environment as well be able to interact with it to achieve certain goals. For acting rationally, a machine must be able to obtain information and understand it. Knowledge Representation (KR) is an important step of automated reasoning, where the knowledge about the world is represented in a way such that a machine can understand and process. Also, it must be able to accommodate the changes about the world (i.e., the new or updated knowledge).
This year's annual meeting of the Association for Computational Linguistics (ACL 2019) was bigger than ever. Although the conference received 75% more submissions than last year, the quality of the research papers remained high, and so the acceptance rates are almost the same. It is becoming more and more challenging to keep track of the latest research advances in your area with such an overwhelming number of good research papers coming out. So, for your convenience, we've picked up and summarized several interesting research papers that might have particularly useful applications in a business setting. These are also the papers that got lots of attention from the AI community, and most of these studies have been nominated for or awarded ACL Best Paper Awards.
Understanding narratives requires reading between the lines, which in turn, requires interpreting the likely causes and effects of events, even when they are not mentioned explicitly. In this paper, we introduce Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions. In stark contrast to most existing reading comprehension datasets where the questions focus on factual and literal understanding of the context paragraph, our dataset focuses on reading between the lines over a diverse collection of people's everyday narratives, asking such questions as "what might be the possible reason of ...?", or "what would have happened if ..." that require reasoning beyond the exact text spans in the context. To establish baseline performances on Cosmos QA, we experiment with several state-of-the-art neural architectures for reading comprehension, and also propose a new architecture that improves over the competitive baselines. Experimental results demonstrate a significant gap between machine (68.4%) and human performance (94%), pointing to avenues for future research on commonsense machine comprehension. Dataset, code and leaderboard is publicly available at https://wilburone.github.io/cosmos.