The Cyc project is predicated on the idea that effective machine learning depends on having a core of knowledge that provides a context for novel learned information - what is known informally as "common sense." Over the last twenty years, a sufficient core of common sense knowledge has been entered into Cyc to allow it to begin effectively and flexibly supporting its most important task: increasing its own store of world knowledge. In this paper, we present initial work on a method of using a combination of Cyc and the World Wide Web, accessed via Google, to assist in entering knowledge into Cyc. The long-term goal is automating the process of building a consistent, formalized representation of the world in the Cyc knowledge base via machine learning. We present preliminary results of this work and describe how we expect the knowledge acquisition process to become more accurate, faster, and more automated in the future.
The Cyc project is predicated on the idea that, in order to be effective and flexible, computer software must have an understanding of the context in which its tasks are performed. We believe this context is what is known informally as "common sense." Over the last twenty years, sufficient common sense knowledge has been entered into Cyc to allow it to more effectively and flexibly support an important task: increasing its own store of world knowledge. In this paper, we describe the Cyc knowledge base and inference system, enumerate the means that it provides for knowledge elicitation, including some means suitable for use by untrained or lightly trained volunteers, review some ways in which we expect to have Cyc assist in verifying and validating collected knowledge, and describe how we expect the knowledge acquisition process to accelerate in the future.
This paper describes a novel, unsupervised method of word sense disambiguation that is wholly semantic, drawing upon a complex, rich ontology and inference engine (the Cyc system). This method goes beyond more familiar semantic closeness approaches to disambiguation that rely on string cooccurrence or relative location in a taxonomy or concept map by 1) exploiting a rich array of properties, including higher-order properties, not available in merely taxonomic (or other first-order) systems, and 2) appealing to the semantic contribution a word sense makes to the content of the target text. Experiments show that this method produces results markedly better than chance when disambiguating word senses in a corpus of topically unrelated documents.
Learning by reading requires integrating several strands of AI research. We describe a prototype system, Learning Reader, which combines natural language processing, a large-scale knowledge base, and analogical processing to learn by reading simplified language texts. We outline the architecture of Learning Reader and some of system-level results, then explain how these results arise from the components. Specifically, we describe the design, implementation, and performance characteristics of a natural language understanding model (DMAP) that is tightly coupled to a knowledge base three orders of magnitude larger than previous attempts. We show that knowing the kinds of questions being asked and what might be learned can help provide more relevant, efficient reasoning. Finally, we show that analogical processing provides a means of generating useful new questions and conjectures when the system ruminates off-line about what it has read.
SemNews is a semantic news service that monitors different RSS news feeds and provides structured representations of the meaning of news. As new content appears, SemNews extracts the summary from the RSS description and processes it using OntoSem, which is a sophisticated text understanding system. The OntoSem environment is a rich and extensive tool for extracting and representing meaning in a language independent way. OntoSem performs a syntactic, semantic, and pragmatic analysis of the text, resulting in its text meaning representation or TMR. The TMRs are represented using a constructed world model or an ontology that consists of about 8000 Concepts.