Goto

Collaborating Authors

User Model-Based Intent-Aware Metrics for Multilingual Search Evaluation

arXiv.org Machine Learning

Despite the growing importance of multilingual aspect of web search, no appropriate offline metrics to evaluate its quality are proposed so far. At the same time, personal language preferences can be regarded as intents of a query. This approach translates the multilingual search problem into a particular task of search diversification. Furthermore, the standard intent-aware approach could be adopted to build a diversified metric for multilingual search on the basis of a classical IR metric such as ERR. The intent-aware approach estimates user satisfaction under a user behavior model. We show however that the underlying user behavior models is not realistic in the multilingual case, and the produced intent-aware metric do not appropriately estimate the user satisfaction. We develop a novel approach to build intent-aware user behavior models, which overcome these limitations and convert to quality metrics that better correlate with standard online metrics of user satisfaction.


Detecting Multilingual and Multi-Regional Query Intent in Web Search

AAAI Conferences

With rapid growth of commercial search engines, detecting multilingual and multi-regional intent underlying search queries becomes a critical challenge to serve international users with diverse language and region requirements. We introduce a query intent probabilistic model, whose input is the number of clicks on documents from different regions and in different language, while the output of this model is a smoothed probabilistic distribution of multilingual and multi-regional query intent. Based on an editorial test to evaluate the accuracy of the intent classifier, our probabilistic model could improve the accuracy of multilingual intent detection for 15%, and improve multi-regional intent detection for 18%. To improve web search quality, we propose a set of new ranking features to combine multilingual and multi-regional query intent with document language/region attributes, and apply different approaches in integrating intent information to directly affect ranking. The experiments show that the novel features could provide 2.31% NDCG@1 improvement and 1.81% NDCG@5 improvement.


Exploring Client-Side Instrumentation for Personalized Search Intent Inference: Preliminary Experiments

AAAI Conferences

Clickthrough on search results have been successfully used to infer user interest and preferences, but are often noisy and potentially ambiguous. The reason mainly lies in that the clickthrough features are inherently a representation of the majority of user intents, rather than the information needs of the individual users for a given query instance. In this paper, we explore how to recover personalized search intent for each search instance, using a more sensitive and rich client-side instrumentation (including mouse movements) to provide additional insights into the intent behind each query instance. We report preliminary results of learning to infer query intent over rich instrumentation of search result pages. In particular, we explore whether we can automatically distinguish the different query classes such as navigational vs. informational queries. Our preliminary results confirm our intuition that client-side instrumentation is superior for personalized user intent inference, and suggest interesting avenues for future exploration.


Session Based Click Features for Recency Ranking

AAAI Conferences

Recency ranking refers to the ranking of web results by accounting for both relevance and freshness. This is particularly important for "recency sensitive" queries such as breaking news queries. In this study, we propose a set of novel click features to improve machine learned recency ranking. Rather than computing simple aggregate click through rates, we derive these features using the temporal click through data and query reformulation chains. One of the features that we use is click buzz that captures the spiking interest of a url for a query. We also propose time weighted click through rates which treat recent observations as being exponentially more important. The promotion of fresh content is typically determined by the query intent which can change dynamically over time. Quite often users query reformulations convey clues about the query's intent. Hence we enrich our click features by following query reformulations which typically benefit the first query in the chain of reformulations. Our experiments show these novel features can improve the NDCG5 of a major online search engine's ranking for "recency sensitive" queries by up to 1.57%. This is one of the very few studies that exploits temporal click through data and query reformulations for recency ranking.


Generating True Relevance Labels in Chinese Search Engine Using Clickthrough Data

AAAI Conferences

In current search engines, ranking functions are learned from a large number of labeled <query, URL> pairs in which the labels are assigned by human judges, describing how well the URLs match the different queries. However in commercial search engines, collecting high quality labels is time-consuming and labor-intensive. To tackle this issue, this paper studies how to produce the true relevance labels for  <query, URL> pairs using clickthrough data. By analyzing the correlations between query frequency, true relevance labels and users’ behaviors, we demonstrate that the users who search the queries with similar frequency have similar search intents and behavioral characteristics. Based on such properties, we propose an efficient discriminative parameter estimation in a multiple instance learning algorithm (MIL) to automatically produce true relevance labels for  <query, URL> pairs. Furthermore, we test our approach using a set of real world data extracted from a Chinese commercial search engine. Experimental results not only validate the effectiveness of the proposed approach, but also indicate that our approach is more likely to agree with the aggregation of the multiple judgments when strong disagreements exist in the panel of judges. In the event that the panel of judges is consensus, our approach provides more accurate automatic label results. In contrast with other models, our approach effectively improves the correlation between automatic labels and manual labels.