Knowledge that Everyone Knows. "People do not walk on their heads." The assertion comes about 900 statements deep into the 527,308 items that comprise the Open Mind common sense database. It's after "Laws are the rules of society" and before "The sky is blue during the day." This collection of mundane facts, which would take more than 20,000 pages to print out, consists entirely of statements so unremarkable they are barely worth stating. Most of us would correctly dismiss them as common sense.
– from D.C. Denison, Guess who's smarter. Boston Globe Online (page hosted at MIT), May 26, 2003.
The Cyc project (initially planned from 1984 to 1994) is the world's longest-lived AI project. The idea was to create a machine with "common sense," and it was predicted that about 10 years should suffice to see significant results. That didn't quite work out, and today, after 35 years, the project is still going on -- although by now very few experts still believe in the promises made by Cyc's developers. Common sense is more than just explaining the meaning of words. For example, we have already seen how "sibling" or "daughter" can be explained in Prolog with a dictionary-like definition.
For more than five decades, DARPA has been a leader in generating groundbreaking research and development (R&D) that facilitated the advancement and application of rule-based and statistical-learning based AI technologies. Today, DARPA continues to lead innovation in AI research as it funds a broad portfolio of R&D programs, ranging from basic research to advanced technology development. DARPA believes this future, where systems are capable of acquiring new knowledge through generative contextual and explanatory models, will be realized upon the development and application of "Third Wave" AI technologies. DARPA announced in September 2018 a multi-year investment of more than $2 billion in new and existing programs called the "AI Next" campaign. Key areas of the campaign include automating critical DoD business processes, such as security clearance vetting or accrediting software systems for operational deployment; improving the robustness and reliability of AI systems; enhancing the security and resiliency of machine learning and AI technologies; reducing power, data, and performance inefficiencies; and pioneering the next generation of AI algorithms and applications, such as "explainability" and common sense reasoning.
We introduce a new benchmark task for coreference resolution, Hard-CoRe, that targets common-sense reasoning and world knowledge. Previous coreference resolution tasks have been overly vulnerable to systems that simply exploit the number and gender of the antecedents, or have been handcrafted and do not reflect the diversity of sentences in naturally occurring text. With these limitations in mind, we present a resolution task that is both challenging and realistic. We demonstrate that various coreference systems, whether rule-based, feature-rich, graphical, or neural-based, perform at random or slightly above-random on the task, whereas human performance is very strong with high inter-annotator agreement. To explain this performance gap, we show empirically that state-of-the art models often fail to capture context and rely only on the antecedents to make a decision.
The NLP and ML communities have long been interested in developing models capable of common-sense reasoning, and recent works have significantly improved the state of the art on benchmarks like the Winograd Schema Challenge (WSC). Despite these advances, the complexity of tasks designed to test common-sense reasoning remains under-analyzed. In this paper, we make a case study of the Winograd Schema Challenge and, based on two new measures of instance-level complexity, design a protocol that both clarifies and qualifies the results of previous work. Our protocol accounts for the WSC's limited size and variable instance difficulty, properties common to other common-sense benchmarks. Accounting for these properties when assessing model results may prevent unjustified conclusions.
When answering a question, people often draw upon their rich world knowledge in addition to some task-specific context. Recent work has focused primarily on answering questions based on some relevant document or content, and required very little general background. To investigate question answering with prior knowledge, we present CommonsenseQA: a difficult new dataset for commonsense question answering. To capture common sense beyond associations, each question discriminates between three target concepts that all share the same relationship to a single source drawn from ConceptNet (Speer et al., 2017). This constraint encourages crowd workers to author multiple-choice questions with complex semantics, in which all candidates relate to the subject in a similar way. We create 9,500 questions through this procedure and demonstrate the dataset's difficulty with a large number of strong baselines. Our best baseline, the OpenAI GPT (Radford et al., 2018), obtains 54.8% accuracy, well below human performance, which is 95.3%.
Today's machine learning systems are more advanced than ever, capable of automating increasingly complex tasks and serving as a critical tool for human operators. Despite recent advances, however, a critical component of Artificial Intelligence (AI) remains just out of reach – machine common sense. Defined as "the basic ability to perceive, understand, and judge things that are shared by nearly all people and can be reasonably expected of nearly all people without need for debate," common sense forms a critical foundation for how humans interact with the world around them. Possessing this essential background knowledge could significantly advance the symbiotic partnership between humans and machines. But articulating and encoding this obscure-but-pervasive capability is no easy feat.
The Winograd Schema (WS) challenge has been proposed as an alternative to the Turing Test as a test for machine intelligence. In this short paper we "situate" the WS challenge in the data-information-knowledge continuum, suggesting in the process what a good WS is. Furthermore, we suggest that the WS is a special case of a more general phenomenon in language understanding, namely the phenomenon of the "missing text". In particular, we will argue that what we usually call thinking in the process of language understanding almost always involves discovering the missing text - text is rarely explicitly stated but is implicitly assumed as shared background knowledge. We therefore suggest extending the WS challenge to include tests beyond those involving reference resolution, including examples that require discovering the missing text in situations that are usually treated in computational linguistics under different labels, such as metonymy, quantifier scope ambiguity, lexical disambiguation, and co-predication, to name a few.
Is there a'Simple' Machine Learning Method for Commonsense Reasoning? Menlo Park, CA This is a short Commentary on Trinh & Le (2018) ("A Simple Method for Commonsense Reasoning") that outlines three serious flaws in the cited paper and discusses why data-driven approaches cannot be considered as serious models for the commonsense reasoning needed in natural language understanding in general, and in reference resolution, in particular. A program is then asked the question "what was too small" as a followup to (1a), and the question "what was too big" as a followup to (1b). In a recent paper Trinh and Le (2018) - henceforth T&L - suggested that they have successfully formulated a „simple‟ machine learning method for performing commonsense reasoning, and in particular, the kind of reasoning that would be required in the process of language understanding. In simple terms, T&L suggest the following method for "learning" how to successfully resolve the reference "it" in sentences such as those in (1): generate two The Winograd Schema challenge was named after Terry Winograd, one of the pioneers of AI, who pointed out (Winograd, 1972) the need for using commonsense knowledge in resolving a reference such as „they‟ in sentences such as the following: The city councilmen refused the demonstrators a permit because they a.
We argue that logical semantics might have faltered due to its failure in distinguishing between two fundamentally very different types of concepts: ontological concepts, that should be types in a strongly-typed ontology, and logical concepts, that are predicates corresponding to properties of and relations between objects of various ontological types. We will then show that accounting for these differences amounts to the integration of lexical and compositional semantics in one coherent framework, and to an embedding in our logical semantics of a strongly-typed ontology that reflects our commonsense view of the world and the way we talk about it in ordinary language. We will show that in such a framework a number of challenges in natural language semantics can be adequately and systematically treated.
Facebook on Tuesday officially announced that it's hired some of academia's top AI researchers, defending its practice of drawing talent from universities around the globe. Facebook AI Research (FAIR) "relies on open partnerships to help drive AI forward, where researchers have the freedom to control their own agenda," Facebook Chief AI Scientist Yann LeCun wrote in a blog post. "Ours frequently collaborate with academics from other institutions, and we often provide financial and hardware resources to specific universities. The latest hires include Carnegie Mellon Prof. Jessica Hodgins, who will lead a new FAIR lab in Pittsburgh focused on robotics, large-scale and lifelong learning, common sense reasoning, and AI in support of creativity. She'll be joined by Carnegie Mellon Prof. Abhinav Gupta, another robotics expert.