Information Retrieval (IR) is concerned with the identification of documents in a collection that are relevant to a given information need, usually represented as a query containing terms or keywords, which are supposed to be a good description of what the user is looking for. IR systems may improve their effectiveness (i.e., increasing the number of relevant documents retrieved) by using a process of query expansion, which automatically adds new terms to the original query posed by an user. In this paper we develop a method of query expansion based on Bayesian networks. Using a learning algorithm, we construct a Bayesian network that represents some of the relationships among the terms appearing in a given document collection; this network is then used as a thesaurus (specific for that collection). We also report the results obtained by our method on three standard test collections.
Answering queries posed over knowledge bases is a central problem in knowledge representation and database theory. In the database area, checking query containment is an important query optimization and schema integration technique. In knowledge representation it has been used for object classification, schema integration, service discovery, and more. In the presence of a knowledge base, the problem of query containment is strictly related to that of query answering; indeed, the two are reducible to each other; we focus on the latter, and our results immediately extend to the former.
Recently developed production systems enable users to specify an appropriate ordering or a clustering of join operations. Various efficiency heuristics have been used to optimize production rules manually. The problem addressed in this paper is how to automatically determine the best join structure for production system programs. Our algorithm is not to directly apply the efficiency heuristics to programs, but rather to enumerate possible join structures under various constraints. Evaluation results demonstrate this algorithm generates a more efficient program than the one obtained by manual optimization.
We consider a generalization of instance retrieval over knowledge bases that provides users with assertions in which descriptions of qualifying objects are given in addition to their identifiers. Notably, this involves a transfer of basic database paradigms involving caching and query rewriting in the context of an assertion retrieval algebra. We present an optimization framework for this algebra, with a focus on finding plans that avoid any need for general knowledge base reasoning at query execution time when sufficient cached results of earlier requests exist.
We examine the query planning problem in information integration systems in the presence of sources that contain disjunctive information. We show that datalog, the language of choice for representing query plans in information integration systems, is not sufficiently expressive in this case. We prove that disjunctive datalog with inequality is sufficiently expressive, and present a construction of query plans that are guaranteed to extract all available information from disjunctive sources. 1 Introduction We examine the query planning problem in information integration systems in the presence of sources that contain disjunctive information. The query planning problem in such systems can be formally stated as the problem of answering queries using views (Levy et al. 1995; Ullman 1997; Duschka & Genesereth 1997a): View definitions describe the information stored by sources, and query planning requires to rewrite a query into one that only uses these views. In this paper we are going to extend the algorithm for answermg queries using conjunctive views that was introduced in (Duschka& Genesereth 1997a) to also be able to handle disjunction in the view definitions.