Query Expansion in Information Retrieval Systems using a Bayesian Network-Based Thesaurus

arXiv.org Artificial Intelligence

Information Retrieval (IR) is concerned with the identification of documents in a collection that are relevant to a given information need, usually represented as a query containing terms or keywords, which are supposed to be a good description of what the user is looking for. IR systems may improve their effectiveness (i.e., increasing the number of relevant documents retrieved) by using a process of query expansion, which automatically adds new terms to the original query posed by an user. In this paper we develop a method of query expansion based on Bayesian networks. Using a learning algorithm, we construct a Bayesian network that represents some of the relationships among the terms appearing in a given document collection; this network is then used as a thesaurus (specific for that collection). We also report the results obtained by our method on three standard test collections.


Cross-lingual Propagation for Morphological Analysis

AAAI Conferences

Multilingual parallel text corpora provide a powerful means for propagating linguistic knowledge across languages. We present a model which jointly learns linguistic structure for each language while inducing links between them. Our model supports fully symmetrical knowledge transfer, utilizing any combination of supervised and unsupervised data across language barriers. The proposed nonparametric Bayesian model effectively combines cross-lingual alignment with target language predictions. This architecture is a potent alternative to projection methods which decompose these decisions into two separate stages. We apply this approach to the task of morphological segmentation, where the goal is to separate a word into its individual morphemes. When tested on a parallel corpus of Hebrew and Arabic, our joint bilingual model effectively incorporates all available evidence from both languages, yielding significant performance gains.


Query Expansion in Description Logics and Carin Marie-Christine Rousset

AAAI Conferences

Given a knowledge base, expanding a query consists of determining all the ways of deriving it from atoms built on some distinguished predicates. In this paper, we address the problem of determining the expansions of a query in description logics and CARIN. Description Logics are logical formalisms for representing classes of objects (called concepts) and their relationships (expressed by binary relations called roles). Much of the research in description logics has concentrated on algorithms for checldng subsumption between concepts and satisfiability of knowledge bases (see e.g.


Flexible and Scalable Query Planning in Distributed and Heterogeneous Environments

AAAI Conferences

Jos Luis Ambite & Craig A. Knoblock Information Sciences Institute and Department of Computer Science University of Southern California 4676 Admiralty Way, Marina del Rey, CA 90292, USA {ambite, knoblock) isi.edu Abstract We present the apphcation of the Planning by Rewriting (PbR) framework to query planning in distributed and heterogeneous environments. PbR is a new paradigm for efficient high-quality planning that exploits plan rewriting rules and efficient local search techniques to transform an easy-to-generate, but possibly suboptimal, initial plan into a high-quality plan. The resulting planner is scalable, flexible, has anytime behavior, and, applied to query planning, yields a novel combination of traditional query optimization with heterogeneous information source selection. Query planners are the core component of mediator systems, which are becoming increasingly important in a world of interconnected information, and constitute excellent testbeds for planning technology. Introduction Query planning is a problem of considerable practical significance. It lies at the core of mediators, systems that integrate information from multiple distributed and heterogeneous sources, and traditional database systems. Mediators are bccoming increasingly important given the current explosion of information accessible through networks. Query planning in mediators presents particular challenges for planning technology. First, it is a highly combinatorial problem, where complex queries have to be composed from the relevant sources among hundreds of available information sources.


An Assertion Retrieval Algebra for Object Queries over Knowledge Bases

AAAI Conferences

We consider a generalization of instance retrieval over knowledge bases that provides users with assertions in which descriptions of qualifying objects are given in addition to their identifiers. Notably, this involves a transfer of basic database paradigms involving caching and query rewriting in the context of an assertion retrieval algebra. We present an optimization framework for this algebra, with a focus on finding plans that avoid any need for general knowledge base reasoning at query execution time when sufficient cached results of earlier requests exist.