Implementing Cross-Language Text Retrieval Systems for Large-scale Text Collections and the World Wide Web Mark W. Davis and William C. Ogden

AAAI Conferences

QUILT (Query User Interface with Light Translations) is prototype implementation of a complete cross-language text retrieval system that takes English queries and produces English gloss translations of Spanish documents. The system indexes the Spanish documents in Spanish, but converts the English query into a Spanish equivalent set through a novel combination of lexical methods and parallel-corpus disambiguatinn. Similar methods are applied to the returned documento produce a simple translation that can be examined by non-Spanish speakers to gauge the relevance of the document to the original English query. The system integrates traditional, glossary-based machine txanslation technology with information retrieval approaches and demonstrates that relatively simple term substitution and disambiguation approaches can he viable for cross-language text retrieval. Components of QUILT have been used to build a CLTR interface to WWWbased search services.

Personalized Text-Based Music Retrieval

AAAI Conferences

We consider the problem of personalized text-based music retrieval where users' history of preferences are taken into account in addition to their issued textual queries.Current retrieval methods mostly rely on songs meta-data. This limits the query vocabulary. Moreover, it is very costly to gather this information in large collections of music. Alternatively, we use music annotations retrieved from social tagging Websites such as and use them as textual descriptions of songs. Considering a user's profile and using preference patterns of music among all users, as in collaborative filtering approaches, can be useful in providing personalized and more satisfactory results. The main challenge is how to include both users' profiles and the songs meta-data in the retrieval model. In this paper, we propose a hierarchical probabilistic model that takes into account the users' preference history as well as tag co-occurrences in songs. Our model is an extension of LDA where topics are formed as joint clusterings of songs and tags. These topics capture the tag associations and user preferences and correspond to different music tastes. Each user's profile is represented as a distribution over topics which shows the user's interests in different types of music.We will explain how our model can be used for contextual retrieval. Our experimental results show significant improvement in retrieval when user profiles are taken into account.

Information retrieval document search using vector space model in R


Note, there are many variations in the way we calculate the term-frequency(tf) and inverse document frequency (idf), in this post we have seen one variation. Below images show as the other recommended variations of tf and idf, taken from wiki.


AAAI Conferences

Text processing has stimulated great interest over the last several years, prompted by technical advances in storage, searching, telecommunications, and user interfaces. The increasing generation of text causes problems in terms of storage and retrieval, and there are no signs of this trend abating in the future.

An Assertion Retrieval Algebra for Object Queries over Knowledge Bases

AAAI Conferences

We consider a generalization of instance retrieval over knowledge bases that provides users with assertions in which descriptions of qualifying objects are given in addition to their identifiers. Notably, this involves a transfer of basic database paradigms involving caching and query rewriting in the context of an assertion retrieval algebra. We present an optimization framework for this algebra, with a focus on finding plans that avoid any need for general knowledge base reasoning at query execution time when sufficient cached results of earlier requests exist.