Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.
Yeh, Peter Z. (Nuance Communications) | Ramachandran, Deepak (Nuance Communications) | Douglas, Benjamin (Nuance Communications) | Ratnaparkhi, Adwait (Nuance Communications) | Jarrold, William (Nuance Communications) | Provine, Ronald (Nuance Communications) | Patel-Schneider, Peter F. (Nuance Communications) | Laverty, Stephen (Nuance Communications) | Tikku, Nirvana (Nuance Communications) | Brown, Sean (Nuance Communications) | Mendel, Jeremy (Nuance Communications) | Emfield, Adam (Nuance Communications)
In this article, we report on a multiphase R&D effort to develop a conversational second screen application for TV program discovery. Our goal is to share with the community the breadth of artificial intelligence (AI) and natural language (NL) technologies required to develop such an application along with learnings from target end-users. We first give an overview of our application from the perspective of the end-user. We then present the architecture of our application along with the main AI and NL components, which were developed over multiple phases. The first phase focuses on enabling core functionality such as effectively finding programs matching the user’s intent. The second phase focuses on enabling dialog with the user. Finally, we present two user studies, corresponding to these two phases. The results from both studies demonstrate the effectiveness of our application in the target domain.
Keeping up with threat intelligence is a must for a security analyst today. There is a volume of information present in `the wild' that affects an organization. We need to develop an artificial intelligence system that scours the intelligence sources, to keep the analyst updated about various threats that pose a risk to her organization. A security analyst who is better `tapped in' can be more effective. In this paper we present, Cyber-All-Intel an artificial intelligence system to aid a security analyst. It is a system for knowledge extraction, representation and analytics in an end-to-end pipeline grounded in the cybersecurity informatics domain. It uses multiple knowledge representations like, vector spaces and knowledge graphs in a 'VKG structure' to store incoming intelligence. The system also uses neural network models to pro-actively improve its knowledge. We have also created a query engine and an alert system that can be used by an analyst to find actionable cybersecurity insights.
Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.