By extending Cyc's ontology and knowledge base approximately 2 percent, Cycorp and Cleveland Clinic Foundation (CCF) have built a system to answer clinical researchers' ad hoc queries. The query may be long and complex, hence it is only partially understood at first, parsed into a set of CycL (higher-order logic) fragments with open variables. But, surprisingly often, after applying various constraints (medical domain knowledge, common sense, discourse pragmatics, syntax), there is only one single way to fit those fragments together, one semantically meaningful formal query P. The Semantic Research Assistant (SRA) system dispatches a series of database calls and then combines, logically and arithmetically, their results into answers to P. Seeing the first few answers stream back, users may realize that they need to abort, modify, and re-ask their query. Even before they push ASK, just knowing approximately how many answers would be returned can spark such editing. Besides real-time ad hoc query answering, queries can be bundled and persist over time.
The Cyc project is predicated on the idea that effective machine learning depends on having a core of knowledge that provides a context for novel learned information - what is known informally as "common sense." Over the last twenty years, a sufficient core of common sense knowledge has been entered into Cyc to allow it to begin effectively and flexibly supporting its most important task: increasing its own store of world knowledge. In this paper, we present initial work on a method of using a combination of Cyc and the World Wide Web, accessed via Google, to assist in entering knowledge into Cyc. The long-term goal is automating the process of building a consistent, formalized representation of the world in the Cyc knowledge base via machine learning. We present preliminary results of this work and describe how we expect the knowledge acquisition process to become more accurate, faster, and more automated in the future.
While much health data is available online, patients who are not technically astute may be unable to access it because they may not know the relevant resources, they may be reluctant to confront an unfamiliar interface, and they may not know how to compose an answer from information provided by multiple heterogeneous resources. We describe ongoing research in using natural English text queries and automated deduction to obtain answers based on multiple structured data sources in a specific subject domain. Each English query is transformed using natural language technology into an unambiguous logical form; this is submitted to a theorem prover that operates over an axiomatic theory of the subject domain. Symbols in the theory are linked to relations in external databases known to the system. An answer is obtained from the proof, along with an English language explanation of how the answer was obtained. Answers need not be present explicitly in any of the databases, but rather may be deduced or computed from the information they provide. Although English is highly ambiguous, the natural language technology is informed by subject domain knowledge, so that readings of the query that are syntactically plausible but semantically impossible are discarded. When a question is still ambiguous, the system can interrogate the patient to determine what meaning was intended. Additional queries can clarify earlier ones or ask questions referring to previously computed answers. We describe a prototype system, Quadri, which answers questions about HIV treatment using the Stanford HIV Drug Resistance Database and other resources. Natural language processing is provided by PARC’s Bridge, and the deductive mechanism is SRI’s SNARK theorem prover. We discuss some of the problems that must be faced to make this approach work, and some of our solutions.
This paper describes the inference explanation capabilities of Cyc, a logical reasoning system that includes a huge "commonsense" knowledge base and an inference engine that supports both question answering and hypothesis generation. Cyc allows the user to compose queries by means of English templates, and tries to find answers via deductive reasoning. If deduction is fruitless Cyc resorts to abduction, filling in missing pieces of logical arguments with plausible conjectures to obtain provisional answers. Cyc presents its answers and chains of reasoning to the user in English, provides drilldown to external source references whenever possible, and reasons about its own proofs to determine optimal ways of presenting them to the user. When a chain of reasoning relies on conjectures introduced via abduction, the user can interact with the inference explanation to confirm or deny the abduced supports. These capabilities are grounded in the integration of Cyc's natural language components with the knowledge base and inference engine, and in Cyc's capacity to maintain an explicit in-memory record of the facts, rules, and calculations used to produce successful proofs during inference.
This article gives a detailed description of True Knowledge: a commercial, open-domain question answering platform. The system combines a large and growing structured knowledge base of common sense, factual and lexical knowledge; a natural language translation system that turns user questions into internal language-independent queries and an inference system that can answer those queries using both directly represented and inferred knowledge. The system is live and answers millions of questions per month asked by internet users.