AITopics

This paper focuses on analyzing and predicting not-answered questions in Community based Question Answering (CQA) services, such as Yahoo! Answers. In CQA services, users express their information needs by submitting natural language questions and await answers from other human users. Comparing to receiving results from web search engines using keyword queries, CQA users are likely to get more specific answers, because human answerers may catch the main point of the question. However, one of the key problems of this pattern is that sometimes no one helps to give answers, while web search engines hardly fail to response. In this paper, we analyze the not-answered questions and give a first try of predicting whether questions will receive answers. More specifically, we first analyze the questions of Yahoo Answers based on the features selected from different perspectives. Then, we formalize the prediction problem as supervised learning – binary classification problem and leverage the proposed features to make predictions. Extensive experiments are made on 76,251 questions collected from Yahoo! Answers. We analyze the specific characteristics of not-answered questions and try to suggest possible reasons for why a question is not likely to be answered. As for prediction, the experimental results show that classification based on the proposed features outperforms the simple word-based approach significantly.

category, machine learning, question answering, (22 more...)

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.62)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)

Generating True Relevance Labels in Chinese Search Engine Using Clickthrough Data

Song, Hengjie (Nanyang Technological University) | Miao, Chunyan (Nanyang Technological University) | Shen, Zhiqi (Nanyang Technological University)

In current search engines, ranking functions are learned from a large number of labeled <query, URL> pairs in which the labels are assigned by human judges, describing how well the URLs match the different queries. However in commercial search engines, collecting high quality labels is time-consuming and labor-intensive. To tackle this issue, this paper studies how to produce the true relevance labels for <query, URL> pairs using clickthrough data. By analyzing the correlations between query frequency, true relevance labels and users’ behaviors, we demonstrate that the users who search the queries with similar frequency have similar search intents and behavioral characteristics. Based on such properties, we propose an efficient discriminative parameter estimation in a multiple instance learning algorithm (MIL) to automatically produce true relevance labels for <query, URL> pairs. Furthermore, we test our approach using a set of real world data extracted from a Chinese commercial search engine. Experimental results not only validate the effectiveness of the proposed approach, but also indicate that our approach is more likely to agree with the aggregation of the multiple judgments when strong disagreements exist in the panel of judges. In the event that the panel of judges is consensus, our approach provides more accurate automatic label results. In contrast with other models, our approach effectively improves the correlation between automatic labels and manual labels.

information retrieval, machine learning, natural language, (20 more...)

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York (0.04)
North America > United States > Nevada (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(5 more...)

Genre: Research Report (0.66)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Fast Query Recommendation by Search

Jiang, Qixia (Tsinghua University) | Sun, Maosong (Tsinghua University)

Query recommendation can not only effectively facilitate users to obtain their desired information but alsoincrease ads’ click-through rates. This paper presentsa general and highly efficient method for query recommendation. Given query sessions, we automatically generate many similar and dissimilar query-pairs as the prior knowledge. Then we learn a transformation from the prior knowledge to move similar queries closer such that similar queries tend to have similar hash values.This is formulated as minimizing the empirical error on the prior knowledge while maximizing the gap between the data and some partition hyperplanes randomly generated in advance. In the recommendation stage, we search queries that have similar hash values to the given query, rank the found queries and return the top K queries as the recommendation result. All the experimental results demonstrate that our method achieves encouraging results in terms of efficiency and recommendation performance.

data mining, information retrieval, machine learning, (22 more...)

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.05)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Detecting Multilingual and Multi-Regional Query Intent in Web Search

Chang, Yi (Yahoo! Labs) | Zhang, Ruiqiang (Yahoo! Labs) | Reddy, Srihari (Yahoo! Labs) | Liu, Yan (University of Southern California)

With rapid growth of commercial search engines, detecting multilingual and multi-regional intent underlying search queries becomes a critical challenge to serve international users with diverse language and region requirements. We introduce a query intent probabilistic model, whose input is the number of clicks on documents from different regions and in different language, while the output of this model is a smoothed probabilistic distribution of multilingual and multi-regional query intent. Based on an editorial test to evaluate the accuracy of the intent classifier, our probabilistic model could improve the accuracy of multilingual intent detection for 15%, and improve multi-regional intent detection for 18%. To improve web search quality, we propose a set of new ranking features to combine multilingual and multi-regional query intent with document language/region attributes, and apply different approaches in integrating intent information to directly affect ranking. The experiments show that the novel features could provide 2.31% NDCG@1 improvement and 1.81% NDCG@5 improvement.

artificial intelligence, information retrieval, natural language, (18 more...)

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Taiwan (0.07)
Asia > China > Hong Kong (0.05)
(4 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.86)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Markov Logic Sets: Towards Lifted Information Retrieval Using PageRank and Label Propagation

Neumann, Marion (Fraunhofer IAIS) | Ahmadi, Babak (Fraunhofer IAIS) | Kersting, Kristian (Fraunhofer IAIS)

Inspired by “GoogleTM Sets” and Bayesian sets, we consider the problem of retrieving complex objects and relations among them, i.e., ground atoms from a logical concept, given a query consisting of a few atoms from that concept. We formulate this as a within-network relational learning problem using few labels only and describe an algorithm that ranks atoms using a score based on random walks with restart (RWR): the probability that a random surfer hits an atom starting from the query atoms. Specifically, we compute an initial ranking using personalized PageRank. Then, we find paths of atoms that are connected via their arguments, variablize the ground atoms in each path, in order to create features for the query. These features are used to re-personalize the original RWR and to finally compute the set completion, based on Label Propagation. Moreover, we exploit that RWR techniques can naturally be lifted and show that lifted inference for label propagation is possible. We evaluate our algorithm on a realworld relational dataset by finding completions of sets of objects describing the Roman city of Pompeii. We compare to Bayesian sets and show that our approach gives very reasonable set completions.

information retrieval, logic & formal reasoning, machine learning, (19 more...)

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

Europe > Italy (0.04)
Europe > Germany (0.04)

Industry: Education (0.34)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)

Journal of Artificial Intelligence ResearchJul-29-2011

The Opposite of Smoothing: A Language Model Approach to Ranking Query-Specific Document Clusters

Kurland, O., Krikon, E.

Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as a means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage of relevant documents that they contain. While most previous cluster ranking approaches focus on the cluster as a whole, our model utilizes also information induced from documents associated with the cluster. Our model substantially outperforms previous approaches for identifying clusters containing a high relevant-document percentage. Furthermore, using the model to produce document ranking yields precision-at-top-ranks performance that is consistently better than that of the initial ranking upon which clustering is performed. The performance also favorably compares with that of a state-of-the-art pseudo-feedback-based retrieval method.

clustranker, croft, kurland & lee, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3327

AI Access Foundation

10714

Journal of Artificial Intelligence Research

Country:

Asia > Middle East > Israel (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Shahaf, Dafna (Carnegie Mellon University) | Guestrin, Carlos (Carnegie Mellon University)

Connecting the Dots Between News Articles

The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today’s society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture. In this paper, we investigate methods for automatically connecting the dots – providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events leading from the decline of home prices (2007) to the health-care debate (2009). We formalize the characteristics of a good chain and provide efficient algorithms to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate the algorithm's effectiveness in helping users understanding the news.

algorithm, coherence, objective, (17 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Asia > Afghanistan (0.05)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Industry:

Law (0.95)
Health & Medicine (0.88)
Government > Military (0.66)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

A New Search Engine Integrating Hierarchical Browsing and Keyword Search

Kuang, Da (The University of Western Ontario) | Li, Xiao (The University of Western Ontario) | Ling, Charles X. (The University of Western Ontario)

The original Yahoo! search engine consists of manually organized topic hierarchy of webpages for easy browsing. Modern search engines (such as Google and Bing), on the other hand, return a flat list of webpages based on keywords. It would be ideal if hierarchical browsing and keyword search can be seamlessly combined. The main difficulty in doing so is to automatically (i.e., not manually) classify and rank a massive number of webpages into various hierarchies (such as topics, media types, regions of the world). In this paper we report our attempt towards building this integrated search engine, called SEE (Search Engine with hiErarchy). We implement a hierarchical classification system based on Support Vector Machines, and embed it in SEE. We also design a novel user interface that allows users to dynamically adjust their desire for a higher accuracy vs. more results in any (sub)category of the hierarchy. Though our current search engine is still small (indexing about 1.2 million webpages), the results, including a small user study, have shown a great promise for integrating such techniques in the next-generation search engine.

category, search engine, webpage, (15 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China (0.06)
North America > United States > New York > New York County > New York City (0.05)
(2 more...)

Industry: Leisure & Entertainment > Sports (0.47)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.87)

Mining User Dwell Time for Personalized Web Search Re-Ranking

Xu, Songhua (Oak Ridge National Laboratory) | Jiang, Hao (The University of Hong Kong) | Lau, Francis Chi-Moon (The University of Hong Kong)

We propose a personalized re-ranking algorithm through mining user dwell times derived from a user's previously online reading or browsing activities. We acquire document level user dwell times via a customized web browser, from which we then infer concept word level user dwell times in order to understand a user's personal interest. According to the estimated concept word level user dwell times, our algorithm can estimate a user's potential dwell time over a new document, based on which personalized webpage re-ranking can be carried out. We compare the rankings produced by our algorithm with rankings generated by popular commercial search engines and a recently proposed personalized ranking algorithm. The results clearly show the superiority of our method.

algorithm, dwell time, user dwell time, (14 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)

Learning Inter-Related Statistical Query Translation Models for English-Chinese Bi-Directional CLIR

Zhang, Yuejie (Fudan University) | Cen, Lei (Fudan University) | Jin, Cheng (Fudan University) | Xue, Xiangyang (Fudan University) | Fan, Jianping (The University of North Carolina at Charlotte)

To support more precise query translation for English-Chinese Bi-Directional Cross-Language Information Retrieval (CLIR), we have developed a novel framework by integrating a semantic network to characterize the correlations between multiple inter-related text terms of interest and learn their inter-related statistical query translation models. First, a semantic network is automatically generated from large-scale English-Chinese bilingual parallel corpora to characterize the correlations between a large number of text terms of interest. Second, the semantic network is exploited to learn the statistical query translation models for such text terms of interest. Finally, these inter-related query translation models are used to translate the queries more precisely and achieve more effective CLIR. Our experiments on a large number of official public data have obtained very positive results.

semantic network, text term, translation, (16 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > North Carolina (0.04)
Asia > China > Hong Kong (0.04)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)
Health & Medicine > Therapeutic Area > Immunology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)