Information Retrieval
Global Ranking Using Continuous Conditional Random Fields
Qin, Tao, Liu, Tie-yan, Zhang, Xu-dong, Wang, De-sheng, Li, Hang
This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for `local ranking', in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.
Predictive Indexing for Fast Search
Goel, Sharad, Langford, John, Strehl, Alexander L.
We tackle the computational problem of query-conditioned search. Given a machine-learned scoring rule and a query distribution, we build a predictive index by precomputing lists of potential results sorted based on an expected score of the result over future queries. The predictive index datastructure supports an anytime algorithm for approximate retrieval of the top elements. The general approach is applicable to webpage ranking, internet advertisement, and approximate nearest neighbor search. It is particularly effective in settings where standard techniques (e.g., inverted indices) are intractable. We experimentally find substantial improvement over existing methods for internet advertisement and approximate nearest neighbors.
Statistical Consistency of Top-k Ranking
Xia, Fen, Liu, Tie-yan, Li, Hang
This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutation-level ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions.
Adapting to the Shifting Intent of Search Queries
Syed, Umar, Slivkins, Aleksandrs, Mishra, Nina
Search engines today present results that are often oblivious to recent shifts in intent. For example, the meaning of the query independence day shifts in early July to a US holiday and to a movie around the time of the box office release. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, etc account for 1/2 the search queries. This paper shows that the signals a search engine receives can be used to both determine that a shift in intent happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases.
Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions
Ram, Parikshit, Lee, Dongryeol, Ouyang, Hua, Gray, Alexander G.
The longstanding problem of efficient nearest-neighbor (NN) search has ubiquitous applicationsranging from astrophysics to MP3 fingerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NNsearch becomes computationally prohibitive;(1) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the first time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior.
Polynomial Semantic Indexing
Bai, Bing, Weston, Jason, Grangier, David, Collobert, Ronan, Sadamasa, Kunihiko, Qi, Yanjun, Cortes, Corinna, Mohri, Mehryar
We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods.
Why so? or Why no? Functional Causality for Explaining Query Answers
Meliou, Alexandra, Gatterbauer, Wolfgang, Moore, Katherine F., Suciu, Dan
In this paper, we propose causality as a unified framework to explain query answers and non-answers, thus generalizing and extending several previously proposed approaches of provenance and missing query result explanations. We develop our framework starting from the well-studied definition of actual causes by Halpern and Pearl [13]. After identifying some undesirable characteristics of the original definition, we propose functional causes as a refined definition of causality with several desirable properties. These properties allow us to apply our notion of causality in a database context and apply it uniformly to define the causes of query results and their individual contributions in several ways: (i) we can model both provenance as well as nonanswers, (ii) we can define explanations as either data in the input relations or relational operations in a query plan, and (iii) we can give graded degrees of responsibility to individual causes, thus allowing us to rank causes. In particular, our approach allows us to explain contributions to relational aggregate functions and to rank causes according to their respective responsibilities. We give complexity results and describe polynomial algorithms for evaluating causality in tractable cases. Throughout the paper, we illustrate the applicability of our framework with several examples. Overall, we develop in this paper the theoretical foundations of causality theory in a database context.
Cross-lingual keyword assignment
Introduction In the last years, many useful NLP tools have been developed and many of them are now even available commercially. Most of these tools are monolingual or multi-monolingual, meaning that the software can deal with more than one language, but that the re sults will al ways be displayed in the same language as the text. We therefore distinguish these applications from cross-lingual software, which is software that helps to transgress the language bound ary. Examples for such applications are machine translation and cross-lingual document retrieval, i.e. retrieval using search engines which allow to en ter a search term in one language and which also yield results in other languages, usually because the query is translated in one way or another. In our eyes, cross-lingual applications are currently the bottleneck of available NLP tools. To our knowledge, there are no applications that allow comparing documents written in dif ferent languages with each other and there are very few which give users a quick overview of the ap proximate contents of documents written in different languages.
Considering users' behaviours in improving the responses of an information base
Afolabi, Babajide, Thiery, Odile
In this paper, our aim is to propose a model that helps in the efficient use of an information system by users, within the organization represented by the IS, in order to resolve their decisional problems. In other words we want to aid the user within an organization in obtaining the information that corresponds to his needs (informational needs that result from his decisional problems). This type of information system is what we refer to as economic intelligence system because of its support for economic intelligence processes of the organisation. Our assumption is that every EI process begins with the identification of the decisional problem which is translated into an informational need. This need is then translated into one or many information search problems (ISP). We also assumed that an ISP is expressed in terms of the user's expectations and that these expectations determine the activities or the behaviors of the user, when he/she uses an IS. The model we are proposing is used for the conception of the IS so that the process of retrieving of solution(s) or the responses given by the system to an ISP is based on these behaviours and correspond to the needs of the user.
The Application of Fuzzy Logic to the Construction of the Ranking Function of Information Retrieval Systems
The quality of the ranking function is an important factor that determines the quality of the Information Retrieval system. Each document is assigned a score by the ranking function; the score indicates the likelihood of relevance of the document given a query. In the vector space model, the ranking function is defined by a mathematic expression. We propose a fuzzy logic (FL) approach to defining the ranking function. FL provides a convenient way of converting knowledge expressed in a natural language into fuzzy logic rules. The resulting ranking function could be easily viewed, extended, and verified: * if (tf is high) and (idf is high) > (relevance is high); * if (overlap is high) > (relevance is high). By using above FL rules, we are able to achieve performance approximately equal to the state of the art search engine Apache Lucene (deltaP10 +0.92%; deltaMAP -0.1%). The fuzzy logic approach allows combining the logic-based model with the vector model. The resulting model possesses simplicity and formalism of the logic based model, and the flexibility and performance of the vector model.