Goto

Collaborating Authors

 Information Retrieval


A Content-Based Method to Enhance Tag Recommendation

AAAI Conferences

Tagging has become a primary tool for users to organize and share digital content on many social media sites. In addition, tag information has been shown to enhance capabilities of existing search engines. However, many resources on the web still lack tag information. This paper proposes a content-based approach to tag recommendation which can be applied to webpages with or without prior tag information. While social bookmarking service such as Delicious enables users to share annotated bookmarks, tag recommendation is available only for pages with tags specified by other users. Our proposed approach is motivated by the observation that similar webpages tend to have the same tags. Each webpage can therefore share the tags they own with similar webpages. The propagation of a tag depends on its weight in the originating webpage and the similarity between the sending and receiving webpages. The similarity metric between two webpages is defined as a linear combination of four cosine similarities, taking into account both tag information and page content. Experiments using data crawled from Delicious show that the proposed method is effective in populating untagged webpages with the correct tags.


Learning Conditional Preference Networks with Queries

AAAI Conferences

We investigate the problem of eliciting CP-nets in the well-known model of exact learning with equivalence and membership queries. The goal is to identify a preference ordering with a binary-valued CP-net by guiding the user through a sequence of queries. Each example is a dominance test on some pair of outcomes. In this setting, we show that acyclic CP-nets are not learnable with equivalence queries alone, while they are learnable with the help of membership queries if the supplied examples are restricted to swaps. A similar property holds for tree CP-nets with arbitrary examples. In fact, membership queries allow us to provide attribute-efficient algorithms for which the query complexity is only logarithmic in the number of attributes. Such results highlight the utility of this model for eliciting CP-nets in large multi-attribute domains.


Transfer Learning using Task-Level Features with Application to Information Retrieval

AAAI Conferences

We propose a probabilistic transfer learning model that uses task-level features to control the task mixture selection in a hierarchical Bayesian model. These task-level features, although rarely used in existing approaches, can provide additional information to model complex task distributions and allow effective transfer to new tasks especially when only limited number of data are available. To estimate the model parameters, we develop an empirical Bayes method based on variational approximation techniques. Our experiments on information retrieval show that the proposed model achieves significantly better performance compared with other transfer learning methods.


Exploiting Multi-Modal Interactions: A Unified Framework

AAAI Conferences

Given an imagebase with tagged images, four types of tasks an be executed, i.e., content-based image retrieval, image annotation, text-based image retrieval, and query expansion. For any of these tasks the similarity on the concerned type of objects is essential. In this paper, we propose a framework to tackle these four tasks from a unified view. The essence of the framework is to estimate similarities by exploiting the interactions between objects of different modality. Experiments show that the proposed method can improve similarity estimation, and based on the improved similarity estimation, some simple methods can achieve better performances than some state-of-the-art techniques.


Unsupervised Rank Aggregation with Domain-Specific Expertise

AAAI Conferences

Consider the setting where a panel of judges is repeatedly asked to (partially) rank sets of objects according to given criteria, and assume that the judges' expertise depends on the objects' domain.ย  Learning to aggregate their rankings with the goal of producing a better joint ranking is a fundamental problem in many areas of Information Retrieval and Natural Language Processing, amongst others.ย  However, supervised ranking data is generally difficult to obtain, especially if coming from multiple domains.ย  Therefore, we propose a framework for learning to aggregate votes of constituent rankers with domain specific expertise without supervision.ย  We apply the learning framework to the settings of aggregating full rankings and aggregating top-k lists, demonstrating significant improvements over a domain-agnostic baseline in both cases.


Ranking Structured Documents: A Large Margin Based Approach for Patent Prior Art Search

AAAI Conferences

We propose an approach for automatically ranking structured documents applied to patent prior art search. Our model, SVM Patent Ranking (SVM_PR) incorporates margin constraints that directly capture the specificities of patent citation ranking. Our approach combines patent domain knowledge features with meta-score features from several different general Information Retrieval methods. The training algorithm is an extension of the Pegasos algorithm with performance guarantees, effectively handling hundreds of thousands of patent-pair judgements in a high dimensional feature space. Experiments on a homogeneous essential wireless patent dataset show that SVM_PR performs on average 30%-40% better than many other state-of-the-art general-purpose Information Retrieval methods in terms of the NDCG measure at different cut-off positions.


Declarative Programming of Search Problems with Built-in Arithmetic

AAAI Conferences

We address the problem of providing a logical formalization of arithmetic in declarative modelling languages for NP search problems. The challenge is to simultaneously allow quantification over an infinite domain such as the natural numbers, provide natural modelling facilities, and control expressive power of the language. To address the problem, we introduce an extension of the model expansion (MX) based framework to finite structures embedded in an infinite secondary structure, together with "double-guarded" logics for representing MX specifications for these structures. The logics also contain multi-set functions (aggregate operations). Our main result is that these logics capture the complexity class NP on "small-cost" arithmetical structures.ย 


Complex Question Answering: Unsupervised Learning Approaches and Experiments

Journal of Artificial Intelligence Research

Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topic-oriented, informative multi-document summarization where the goal is to produce a single text as a compressed version of a set of documents with a minimum loss of relevant information. In this paper, we experiment with one empirical method and two unsupervised statistical machine learning techniques: K-means and Expectation Maximization (EM), for computing relative importance of the sentences. We compare the results of these approaches. Our experiments show that the empirical approach outperforms the other two techniques and EM performs better than K-means. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. In order to measure the importance and relevance to the user query we extract different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences. We use a local search technique to learn the weights of the features. To the best of our knowledge, no study has used tree kernel functions to encode syntactic/semantic information for more complex tasks such as computing the relatedness between the query sentences and the document sentences in order to generate query-focused summaries (or answers to complex questions). For each of our methods of generating summaries (i.e. empirical, K-means and EM) we show the effects of syntactic and shallow-semantic features over the bag-of-words (BOW) features.


Tuning Search Heuristics for Classical Planning with Macro Actions

AAAI Conferences

This paper proposes a new approach to improve domain independent heuristic state space search planners for classical planning by tuning the search heuristics using macro actions of length two extracted from sample plans. This idea is implemented in the planner AltAlt and the new planner Macro-AltAlt is tested on the domains introduced for the learning track of the International Planning Competition (IPC-2008). The performance of Macro-AltAlt measured by the length of the plan found and the number of states explored to find the plan is compared with that of AltAlt.


Improving Biomedical Document Retrieval by Mining Domain Knowledge

AAAI Conferences

When research articles introduce new findings or concepts they typically relate them only to knowledge and domain concepts of immediate relevance. However, many domain concepts relevant for the article and its findings are omitted in the text. This may prevent us from retrieving articles of interest when executing a search query. Approaches such as probabilistic latent semantic indexing (PLSI) overcome this limitation by projecting terms in articles to a lower dimensional latent space and best possible matches in this space are identified. Nevertheless, this approach may not perform well enough if the number of explicit knowledge concepts in the articles is too small compared to the amount of knowledge in the domain. The objective of this paper is to address the problem by exploiting a domain knowledge layer: a rich network of associations among knowledge concepts in the domain of interest. We present a new document retrieval framework that i) extracts associations among knowledge concepts from many documents in the literature corpus; ii) and exploits them to improve the retrieval of relevant documents. We test our approach on the problem of retrieval of biomedical documents and show that it outperforms standard Lucene and BM25 information-retrieval methods.