The KRAKEN toolset is a comprehensive interface for knowledge acquisition that operates in conjunction with the Cyc knowledge base. The KRAKEN system is designed to allow subject-matter experts to make meaningful additions to an existing knowledge base, without the benefit of training in the areas of artificial intelligence, ontology development, or logical representation.
This paper describes a novel, unsupervised method of word sense disambiguation that is wholly semantic, drawing upon a complex, rich ontology and inference engine (the Cyc system). This method goes beyond more familiar semantic closeness approaches to disambiguation that rely on string cooccurrence or relative location in a taxonomy or concept map by 1) exploiting a rich array of properties, including higher-order properties, not available in merely taxonomic (or other first-order) systems, and 2) appealing to the semantic contribution a word sense makes to the content of the target text. Experiments show that this method produces results markedly better than chance when disambiguating word senses in a corpus of topically unrelated documents.
The Cyc project is predicated on the idea that, in order to be effective and flexible, computer software must have an understanding of the context in which its tasks are performed. We believe this context is what is known informally as "common sense." Over the last twenty years, sufficient common sense knowledge has been entered into Cyc to allow it to more effectively and flexibly support an important task: increasing its own store of world knowledge. In this paper, we describe the Cyc knowledge base and inference system, enumerate the means that it provides for knowledge elicitation, including some means suitable for use by untrained or lightly trained volunteers, review some ways in which we expect to have Cyc assist in verifying and validating collected knowledge, and describe how we expect the knowledge acquisition process to accelerate in the future.
The Cyc project is predicated on the idea that effective machine learning depends on having a core of knowledge that provides a context for novel learned information - what is known informally as "common sense." Over the last twenty years, a sufficient core of common sense knowledge has been entered into Cyc to allow it to begin effectively and flexibly supporting its most important task: increasing its own store of world knowledge. In this paper, we present initial work on a method of using a combination of Cyc and the World Wide Web, accessed via Google, to assist in entering knowledge into Cyc. The long-term goal is automating the process of building a consistent, formalized representation of the world in the Cyc knowledge base via machine learning. We present preliminary results of this work and describe how we expect the knowledge acquisition process to become more accurate, faster, and more automated in the future.
Taxonomic case-based reasoning is a conversational casebased reasoning methodology that employs feature subsumption taxonomies for incremental case retrieval. Although this approach has several benefits over standard retrieval approaches, methods for automatically acquiring these taxonomies from text documents do not exist, which limits its widespread implementation. To accelerate and simplify feature acquisition and case indexing, we introduce FACIT, a domain independent framework that combines deep natural language processing techniques and generative lexicons to semi-automatically acquire case indexing taxonomies from text documents. FACIT employs a novel method to generate a logical form representation of text, and uses it to automatically extract and organize features. In contrast to standard information extraction approaches, FACIT's knowledge extraction approach should be more accurate and robust to syntactic variations in text sources due to its use of logical forms.