Knowledge that Everyone Knows. "People do not walk on their heads." The assertion comes about 900 statements deep into the 527,308 items that comprise the Open Mind common sense database. It's after "Laws are the rules of society" and before "The sky is blue during the day." This collection of mundane facts, which would take more than 20,000 pages to print out, consists entirely of statements so unremarkable they are barely worth stating. Most of us would correctly dismiss them as common sense.
– from D.C. Denison, Guess who's smarter. Boston Globe Online (page hosted at MIT), May 26, 2003.
Team up with the AllenAI artificial intelligence to either draw or guess a set of unique phrases using a limited set of icons to compose your drawing. AllenAI combines advanced computer vision, language understanding, and common sense reasoning to make guesses based on your drawings and to produce its own complex scenes for you to try to guess.
Introducing common sense to natural language understanding systems has received increasing research attention. It remains a fundamental question on how to evaluate whether a system has a sense making capability. Existing benchmarks measures commonsense knowledge indirectly and without explanation. In this paper, we release a benchmark to directly test whether a system can differentiate natural language statements that make sense from those that do not make sense. In addition, a system is asked to identify the most crucial reason why a statement does not make sense. We evaluate models trained over large-scale language modeling tasks as well as human performance, showing that there are different challenges for system sense making.
The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.
Selectional Preference (SP) is a commonly observed language phenomenon and proved to be useful in many natural language processing tasks. To provide a better evaluation method for SP models, we introduce SP-10K, a large-scale evaluation set that provides human ratings for the plausibility of 10,000 SP pairs over five SP relations, covering 2,500 most frequent verbs, nouns, and adjectives in American English. Three representative SP acquisition methods based on pseudo-disambiguation are evaluated with SP-10K. To demonstrate the importance of our dataset, we investigate the relationship between SP-10K and the commonsense knowledge in ConceptNet5 and show the potential of using SP to represent the commonsense knowledge. We also use the Winograd Schema Challenge to prove that the proposed new SP relations are essential for the hard pronoun coreference resolution problem.
Winograd Schema Challenge (WSC) was proposed as an AI-hard problem in testing computers' intelligence on common sense representation and reasoning. This paper presents the new state-of-theart on WSC, achieving an accuracy of 71.1%. We demonstrate that the leading performance benefits from jointly modelling sentence structures, utilizing knowledge learned from cutting-edge pretraining models, and performing fine-tuning. We conduct detailed analyses, showing that fine-tuning is critical for achieving the performance, but it helps more on the simpler associative problems. Modelling sentence dependency structures, however, consistently helps on the harder non-associative subset of WSC. Analysis also shows that larger fine-tuning datasets yield better performances, suggesting the potential benefit of future work on annotating more Winograd schema sentences.
Scientists are using the popular drawing game Pictionary to teach artificial intelligence common sense. AI researchers at the Allen Institute for Artificial Intelligence (AI2), a non-profit lab in Seattle, developed a version of the game called Iconary in order to teach its AllenAI artificial intelligence abstract concepts from pictures alone. Iconary was made public on 5 February in order to encourage people to play the game with AllenAI. By learning from humans, the researchers hope AllenAI will continue to develop common sense reasoning. "Iconary is one of the first times an AI system is paired in a collaborative game with a human player instead of antagonistically working against them," the Iconary website states.
The Pentagon's research wing is trying to reduce the amount of computing power and hardware needed to run advanced artificial intelligence tools, and it's turning to insects for inspiration. The Defense Advanced Research Projects Agency on Friday began soliciting ideas on how to build computing systems as small and efficient as the brains of "very small flying insects." The Microscale Biomimetic Robust Artificial Intelligence Networks program, or MicroBRAIN, could ultimately result in artificial intelligence systems that can be trained on less data and operated with less energy, according to the agency. Analyzing insects' brains, which allow them to navigate the world with minimal information, could also help researchers understand how to build AI systems capable of basic common sense reasoning. "Nature has forced on these small insects drastic miniaturization and energy efficiency, some having only a few hundred neurons in a compact form-factor, while maintaining basic functionality," officials wrote in the solicitation.
The Cyc project (initially planned from 1984 to 1994) is the world's longest-lived AI project. The idea was to create a machine with "common sense," and it was predicted that about 10 years should suffice to see significant results. That didn't quite work out, and today, after 35 years, the project is still going on -- although by now very few experts still believe in the promises made by Cyc's developers. Common sense is more than just explaining the meaning of words. For example, we have already seen how "sibling" or "daughter" can be explained in Prolog with a dictionary-like definition.
For more than five decades, DARPA has been a leader in generating groundbreaking research and development (R&D) that facilitated the advancement and application of rule-based and statistical-learning based AI technologies. Today, DARPA continues to lead innovation in AI research as it funds a broad portfolio of R&D programs, ranging from basic research to advanced technology development. DARPA believes this future, where systems are capable of acquiring new knowledge through generative contextual and explanatory models, will be realized upon the development and application of "Third Wave" AI technologies. DARPA announced in September 2018 a multi-year investment of more than $2 billion in new and existing programs called the "AI Next" campaign. Key areas of the campaign include automating critical DoD business processes, such as security clearance vetting or accrediting software systems for operational deployment; improving the robustness and reliability of AI systems; enhancing the security and resiliency of machine learning and AI technologies; reducing power, data, and performance inefficiencies; and pioneering the next generation of AI algorithms and applications, such as "explainability" and common sense reasoning.
We introduce a new benchmark task for coreference resolution, Hard-CoRe, that targets common-sense reasoning and world knowledge. Previous coreference resolution tasks have been overly vulnerable to systems that simply exploit the number and gender of the antecedents, or have been handcrafted and do not reflect the diversity of sentences in naturally occurring text. With these limitations in mind, we present a resolution task that is both challenging and realistic. We demonstrate that various coreference systems, whether rule-based, feature-rich, graphical, or neural-based, perform at random or slightly above-random on the task, whereas human performance is very strong with high inter-annotator agreement. To explain this performance gap, we show empirically that state-of-the art models often fail to capture context and rely only on the antecedents to make a decision.