The Cyc project is predicated on the idea that, in order to be effective and flexible, computer software must have an understanding of the context in which its tasks are performed. We believe this context is what is known informally as "common sense." Over the last twenty years, sufficient common sense knowledge has been entered into Cyc to allow it to more effectively and flexibly support an important task: increasing its own store of world knowledge. In this paper, we describe the Cyc knowledge base and inference system, enumerate the means that it provides for knowledge elicitation, including some means suitable for use by untrained or lightly trained volunteers, review some ways in which we expect to have Cyc assist in verifying and validating collected knowledge, and describe how we expect the knowledge acquisition process to accelerate in the future.
This paper describes the TextLearner prototype, a knowledgeacquisition program that represents the culmination of the DARPA-IPTO-sponsored Reading Learning Comprehension seedling program, an effort to determine the feasibility of autonomous knowledge acquisition through the analysis of text. Built atop the Cyc Knowledge Base and implemented almost entirely in the formal representation language of CycL, TextLearner is an anomaly in the way of Natural Language Understanding programs. The system operates by generating an information-rich model of its target document, and uses that model to explore learning opportunities. In particular, TextLearner generates and evaluates hypotheses, not only about the content of the target document, but about how to interpret unfamiliar natural language constructions. This paper focuses on this second capability and describes four algorithms TextLearner uses to acquire rules for interpreting text.
Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, there is currently enough knowledge in Cyc for it to be feasible to attempt to acquire additional knowledge autonomously. This paper describes a system that can collect and validate formally represented, fully-integrated knowledge from the Web or any other electronically available text corpus, about various entities of interest (e.g.