Goto

Collaborating Authors

 Information Retrieval


Two-Layer Generalization Analysis for Ranking Using Rademacher Average

Neural Information Processing Systems

This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents per query. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of algorithms. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. sampling of queries and the conditional i.i.d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent if represented by features extracted from the matching between document and query. To tackle the challenge, we decompose the generalization error according to the two layers, and make use of the new concept of two-layer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performance of ranking algorithms.


Ontology-based Queries over Cancer Data

arXiv.org Artificial Intelligence

The ever-increasing amount of data in biomedical research, and in cancer research in particular, needs to be managed to support efficient data access, exchange and integration. Existing software infrastructures, such caGrid, support access to distributed information annotated with a domain ontology. However, caGrid's current querying functionality depends on the structure of individual data resources without exploiting the semantic annotations. In this paper, we present the design and development of an ontology-based querying functionality that consists of: the generation of OWL2 ontologies from the underlying data resources metadata and a query rewriting and translation process based on reasoning, which converts a query at the domain ontology level into queries at the software infrastructure level. We present a detailed analysis of our approach as well as an extensive performance evaluation. While the implementation and evaluation was performed for the caGrid infrastructure, the approach could be applicable to other model and metadata-driven environments for data sharing.



Explanation of Relevance Judgement Discrepancy with Quantum Interference

AAAI Conferences

A key concept in many Information Retrieval (IR) tasks, e.g. document indexing, query language modelling, aspect and diversity retrieval, is the relevance measurement of topics, i.e. to what extent an information object (e.g. a document or a query) is about the topics. This paper investigates the interference of relevance measurement of a topic caused by another topic. For example, consider that two user groups are required to judge whether a topic q is relevant to a document d, and q is presented together with another topic (referred to as a companion topic). If different companion topics are used for different groups, interestingly different relevance probabilities of q given d can be reached. In this paper, we present empirical results showing that the relevance of a topic to a document is greatly affected by the companion topicโ€™s relevance to the same document, and the extent of the impact differs with respect to different companion topics. We further analyse the phenomenon from classical and quantum-like interference perspectives, and connect the phenomenon to nonreality and contextuality in quantum mechanics. We demonstrate that quantum like model fits in the empirical data, could be potentially used for predicting the relevance when interference exists.


Logical Leaps and Quantum Connectives: Forging Paths through Predication Space

AAAI Conferences

๏ปฟ๏ปฟ๏ปฟ๏ปฟ๏ปฟThe Predication-based Semantic Indexing (PSI) approach encodes both symbolic and distributional information into a semantic space using a permutation-based variant of Random Indexing. In this paper, we develop and evaluate a computational model of abductive reasoning based on PSI. Using distributional information, we identify pairs of concepts that are likely to be predicated about a common third concept, or middle term. As this occurs without the explicit identification of the middle term concerned, we refer to this process as a โ€œlogical leapโ€. Subsequently, we use further operations in the PSI space to retrieve this middle term and identify the predicate types involved. On evaluation using a set of 1000 randomly selected cue concepts, the model is shown to retrieve with accuracy concepts that can be connected to a cue concept by a middle term, as well as the middle term concerned, using nearest-neighbor search in the PSI space. The utility of quantum logical operators as a means to identify alternative paths through this space is also explored.


An Agent based Approach towards Metadata Extraction, Modelling and Information Retrieval over the Web

arXiv.org Artificial Intelligence

Web development is a challenging research area for its creativity and complexity. The existing raised key challenge in web technology technologic development is the presentation of data in machine read and process able format to take advantage in knowledge based information extraction and maintenance [4]. Currently it is not possible to search and extract optimized results using full text queries because there is no such mechanism exists which can fully extract the semantic from full text queries and then look for particular knowledge based information. Mechanism of presenting information over the web in a format so that the humans as well as machines can understand the context leads to the concept of Semantic Web introduced by Tim Berners Lee [4]. Semantic web is a linked mesh of information to produce technologies capable of reasoning on semi structured information and processed by machines [4].


Semantic Oriented Agent based Approach towards Engineering Data Management, Web Information Retrieval and User System Communication Problems

arXiv.org Artificial Intelligence

The four intensive problems to the software rose by the software industry .i.e., User System Communication / Human Machine Interface, Meta Data extraction, Information processing & management and Data representation are discussed in this research paper. To contribute in the field we have proposed and described an intelligent semantic oriented agent based search engine including the concepts of intelligent graphical user interface, natural language based information processing, data management and data reconstruction for the final user end information representation.


Keyword Extraction and Headline Generation Using Novel Word Features

AAAI Conferences

We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generate a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features offer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feature to charcterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results.


Natural Language Aided Visual Query Building for Complex Data Access

AAAI Conferences

Over the past decades, there have been significant efforts on developing robust and easy-to-use query interfaces to databases. So far, the typical query interfaces are GUI-based visual query interfaces. Visual query interfaces however, have limitations especially when they are used for accessing large and complex datasets. Therefore, we are developing a novel query interface where users can use natural language expressions to help author visual queries. Our work enhances the usability of a visual query interface by directly addressing the "knowledge gap" issue in visual query interfaces. We have applied our work in several real-world applications. Our preliminary evaluation demonstrates the effectiveness of our approach.


Ontological Reasoning with F-logic Lite and its Extensions

AAAI Conferences

Answering queries posed over knowledge bases is a central problem in knowledge representation and database theory. In the database area, checking query containment is an important query optimization and schema integration technique. In knowledge representation it has been used for object classification, schema integration, service discovery, and more. In the presence of a knowledge base, the problem of query containment is strictly related to that of query answering; indeed, the two are reducible to each other; we focus on the latter, and our results immediately extend to the former.