Knowledge that Everyone Knows. "People do not walk on their heads." The assertion comes about 900 statements deep into the 527,308 items that comprise the Open Mind common sense database. It's after "Laws are the rules of society" and before "The sky is blue during the day." This collection of mundane facts, which would take more than 20,000 pages to print out, consists entirely of statements so unremarkable they are barely worth stating. Most of us would correctly dismiss them as common sense.
– from D.C. Denison, Guess who's smarter. Boston Globe Online (page hosted at MIT), May 26, 2003.
Winograd Schema Challenge (WSC) was proposed as an AI-hard problem in testing computers' intelligence on common sense representation and reasoning. This paper presents the new state-of-theart on WSC, achieving an accuracy of 71.1%. We demonstrate that the leading performance benefits from jointly modelling sentence structures, utilizing knowledge learned from cutting-edge pretraining models, and performing fine-tuning. We conduct detailed analyses, showing that fine-tuning is critical for achieving the performance, but it helps more on the simpler associative problems. Modelling sentence dependency structures, however, consistently helps on the harder non-associative subset of WSC. Analysis also shows that larger fine-tuning datasets yield better performances, suggesting the potential benefit of future work on annotating more Winograd schema sentences.
Scientists are using the popular drawing game Pictionary to teach artificial intelligence common sense. AI researchers at the Allen Institute for Artificial Intelligence (AI2), a non-profit lab in Seattle, developed a version of the game called Iconary in order to teach its AllenAI artificial intelligence abstract concepts from pictures alone. Iconary was made public on 5 February in order to encourage people to play the game with AllenAI. By learning from humans, the researchers hope AllenAI will continue to develop common sense reasoning. "Iconary is one of the first times an AI system is paired in a collaborative game with a human player instead of antagonistically working against them," the Iconary website states.
The Pentagon's research wing is trying to reduce the amount of computing power and hardware needed to run advanced artificial intelligence tools, and it's turning to insects for inspiration. The Defense Advanced Research Projects Agency on Friday began soliciting ideas on how to build computing systems as small and efficient as the brains of "very small flying insects." The Microscale Biomimetic Robust Artificial Intelligence Networks program, or MicroBRAIN, could ultimately result in artificial intelligence systems that can be trained on less data and operated with less energy, according to the agency. Analyzing insects' brains, which allow them to navigate the world with minimal information, could also help researchers understand how to build AI systems capable of basic common sense reasoning. "Nature has forced on these small insects drastic miniaturization and energy efficiency, some having only a few hundred neurons in a compact form-factor, while maintaining basic functionality," officials wrote in the solicitation.
The Cyc project (initially planned from 1984 to 1994) is the world's longest-lived AI project. The idea was to create a machine with "common sense," and it was predicted that about 10 years should suffice to see significant results. That didn't quite work out, and today, after 35 years, the project is still going on -- although by now very few experts still believe in the promises made by Cyc's developers. Common sense is more than just explaining the meaning of words. For example, we have already seen how "sibling" or "daughter" can be explained in Prolog with a dictionary-like definition.
For more than five decades, DARPA has been a leader in generating groundbreaking research and development (R&D) that facilitated the advancement and application of rule-based and statistical-learning based AI technologies. Today, DARPA continues to lead innovation in AI research as it funds a broad portfolio of R&D programs, ranging from basic research to advanced technology development. DARPA believes this future, where systems are capable of acquiring new knowledge through generative contextual and explanatory models, will be realized upon the development and application of "Third Wave" AI technologies. DARPA announced in September 2018 a multi-year investment of more than $2 billion in new and existing programs called the "AI Next" campaign. Key areas of the campaign include automating critical DoD business processes, such as security clearance vetting or accrediting software systems for operational deployment; improving the robustness and reliability of AI systems; enhancing the security and resiliency of machine learning and AI technologies; reducing power, data, and performance inefficiencies; and pioneering the next generation of AI algorithms and applications, such as "explainability" and common sense reasoning.
We introduce a new benchmark task for coreference resolution, Hard-CoRe, that targets common-sense reasoning and world knowledge. Previous coreference resolution tasks have been overly vulnerable to systems that simply exploit the number and gender of the antecedents, or have been handcrafted and do not reflect the diversity of sentences in naturally occurring text. With these limitations in mind, we present a resolution task that is both challenging and realistic. We demonstrate that various coreference systems, whether rule-based, feature-rich, graphical, or neural-based, perform at random or slightly above-random on the task, whereas human performance is very strong with high inter-annotator agreement. To explain this performance gap, we show empirically that state-of-the art models often fail to capture context and rely only on the antecedents to make a decision.
The NLP and ML communities have long been interested in developing models capable of common-sense reasoning, and recent works have significantly improved the state of the art on benchmarks like the Winograd Schema Challenge (WSC). Despite these advances, the complexity of tasks designed to test common-sense reasoning remains under-analyzed. In this paper, we make a case study of the Winograd Schema Challenge and, based on two new measures of instance-level complexity, design a protocol that both clarifies and qualifies the results of previous work. Our protocol accounts for the WSC's limited size and variable instance difficulty, properties common to other common-sense benchmarks. Accounting for these properties when assessing model results may prevent unjustified conclusions.
When answering a question, people often draw upon their rich world knowledge in addition to some task-specific context. Recent work has focused primarily on answering questions based on some relevant document or content, and required very little general background. To investigate question answering with prior knowledge, we present CommonsenseQA: a difficult new dataset for commonsense question answering. To capture common sense beyond associations, each question discriminates between three target concepts that all share the same relationship to a single source drawn from ConceptNet (Speer et al., 2017). This constraint encourages crowd workers to author multiple-choice questions with complex semantics, in which all candidates relate to the subject in a similar way. We create 9,500 questions through this procedure and demonstrate the dataset's difficulty with a large number of strong baselines. Our best baseline, the OpenAI GPT (Radford et al., 2018), obtains 54.8% accuracy, well below human performance, which is 95.3%.
Today's machine learning systems are more advanced than ever, capable of automating increasingly complex tasks and serving as a critical tool for human operators. Despite recent advances, however, a critical component of Artificial Intelligence (AI) remains just out of reach – machine common sense. Defined as "the basic ability to perceive, understand, and judge things that are shared by nearly all people and can be reasonably expected of nearly all people without need for debate," common sense forms a critical foundation for how humans interact with the world around them. Possessing this essential background knowledge could significantly advance the symbiotic partnership between humans and machines. But articulating and encoding this obscure-but-pervasive capability is no easy feat.
Is there a'Simple' Machine Learning Method for Commonsense Reasoning? Menlo Park, CA This is a short Commentary on Trinh & Le (2018) ("A Simple Method for Commonsense Reasoning") that outlines three serious flaws in the cited paper and discusses why data-driven approaches cannot be considered as serious models for the commonsense reasoning needed in natural language understanding in general, and in reference resolution, in particular. A program is then asked the question "what was too small" as a followup to (1a), and the question "what was too big" as a followup to (1b). In a recent paper Trinh and Le (2018) - henceforth T&L - suggested that they have successfully formulated a „simple‟ machine learning method for performing commonsense reasoning, and in particular, the kind of reasoning that would be required in the process of language understanding. In simple terms, T&L suggest the following method for "learning" how to successfully resolve the reference "it" in sentences such as those in (1): generate two The Winograd Schema challenge was named after Terry Winograd, one of the pioneers of AI, who pointed out (Winograd, 1972) the need for using commonsense knowledge in resolving a reference such as „they‟ in sentences such as the following: The city councilmen refused the demonstrators a permit because they a.