Goto

Collaborating Authors

 Information Retrieval


Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy

arXiv.org Artificial Intelligence

Expert finding is an information retrieval task concerned with the search for the most knowledgeable people, in some topic, with basis on documents describing peoples activities. The task involves taking a user query as input and returning a list of people sorted by their level of expertise regarding the user query. This paper introduces a novel approach for combining multiple estimators of expertise based on a multisensor data fusion framework together with the Dempster-Shafer theory of evidence and Shannon's entropy. More specifically, we defined three sensors which detect heterogeneous information derived from the textual contents, from the graph structure of the citation patterns for the community of experts, and from profile information about the academic experts. Given the evidences collected, each sensor may define different candidates as experts and consequently do not agree in a final ranking decision. To deal with these conflicts, we applied the Dempster-Shafer theory of evidence combined with Shannon's Entropy formula to fuse this information and come up with a more accurate and reliable final ranking list. Experiments made over two datasets of academic publications from the Computer Science domain attest for the adequacy of the proposed approach over the traditional state of the art approaches. We also made experiments against representative supervised state of the art algorithms. Results revealed that the proposed method achieved a similar performance when compared to these supervised techniques, confirming the capabilities of the proposed framework.


SHARE: A Web Service Based Framework for Distributed Querying and Reasoning on the Semantic Web

arXiv.org Artificial Intelligence

Here we describe the SHARE system, a web service based framework for distributed querying and reasoning on the semantic web. The main innovations of SHARE are: (1) the extension of a SPARQL query engine to perform on-demand data retrieval from web services, and (2) the extension of an OWL reasoner to test property restrictions by means of web service invocations. In addition to enabling queries across distributed datasets, the system allows for a target dataset that is significantly larger than is possible under current, centralized approaches. Although the architecture is equally applicable to all types of data, the SHARE system targets bioinformatics, due to the large number of interoperable web services that are already available in this area. SHARE is built entirely on semantic web standards, and is the successor of the BioMOBY project.


Towards Finding Relevant Information Graphics: Identifying the Independent and Dependent Axis from User-Written Queries

AAAI Conferences

Information graphics (non-pictorial graphics such as bar charts and line graphs) contain a great deal of knowledge. Information retrieval research has focused on retrieving textual documents and on extracting images based on words appearing in the accompanying article or based on low-level features such as color or texture. Our goal is to build a system for retrieving information graphics that reasons about the content of the graphic itself in deciding its relevance to the user query. As a first step, we aim to identify, from a full sentence user query, what should be depicted on the independent and dependent axes of potentially relevant graphs. Natural language processing techniques are used to extract features from the query and machine learning is employed to build a model for hypothesizing the content of the axes. Results have shown that our models can achieve accuracy higher than 80% on a corpus of collected user queries.


Automated Non-Content Word List Generation Using hLDA

AAAI Conferences

In this paper, we present a language-independent method for the automatic, unsupervised extraction of non-content words from a corpus of documents. This method permits the creation of word lists that may be used in place of traditional function word lists in various natural language processing tasks. As an example we generated lists of words from a corpus of English, Chinese, and Russian posts extracted from Wikipedia articles and Wikipedia Wikitalk discussion pages. We applied these lists to the task of authorship attribution on this corpus to compare the effectiveness of lists of words extracted with this method to expert-created function word lists and frequent word lists (a common alternative to function word lists). hLDA lists perform comparably to frequent word lists. The trials also show that corpus-derived lists tend to perform better than more generic lists, and both sets of generated lists significantly outperformed the expert lists. Additionally, we evaluated the performance of an English expert list on machine translations of our Chinese and Russian documents, showing that our method also outperforms this alternative.


Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization

arXiv.org Machine Learning

Nonnegative Matrix Factorization (NMF) has been continuously evolving in several areas like pattern recognition and information retrieval methods. It factorizes a matrix into a product of 2 low-rank non-negative matrices that will define parts-based, and linear representation of nonnegative data. Recently, Graph regularized NMF (GrNMF) is proposed to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In GNMF, an affinity graph is constructed from the original data space to encode the geometrical information. In this paper, we propose a novel idea which engages a Multiple Kernel Learning approach into refining the graph structure that reflects the factorization of the matrix and the new data space. The GrNMF is improved by utilizing the graph refined by the kernel learning, and then a novel kernel learning method is introduced under the GrNMF framework. Our approach shows encouraging results of the proposed algorithm in comparison to the state-of-the-art clustering algorithms like NMF, GrNMF, SVD etc.


An Architecture for Probabilistic Concept-Based Information Retrieval

arXiv.org Artificial Intelligence

While concept-based methods for information retrieval can provide improved performance over more conventional techniques, they require large amounts of effort to acquire the concepts and their qualitative and quantitative relationships. This paper discusses an architecture for probabilistic concept-based information retrieval which addresses the knowledge acquisition problem. The architecture makes use of the probabilistic networks technology for representing and reasoning about concepts and includes a knowledge acquisition component which partially automates the construction of concept knowledge bases from data. We describe two experiments that apply the architecture to the task of retrieving documents about terrorism from a set of documents from the Reuters news service. The experiments provide positive evidence that the architecture design is feasible and that there are advantages to concept-based methods.


Machine Learning, Clustering, and Polymorphy

arXiv.org Artificial Intelligence

This paper describes a machine induction program (WITT) that attempts to model human categorization. Properties of categories to which human subjects are sensitive includes best or prototypical members, relative contrasts between putative categories, and polymorphy (neither necessary or sufficient features). This approach represents an alternative to usual Artificial Intelligence approaches to generalization and conceptual clustering which tend to focus on necessary and sufficient feature rules, equivalence classes, and simple search and match schemes. WITT is shown to be more consistent with human categorization while potentially including results produced by more traditional clustering schemes. Applications of this approach in the domains of expert systems and information retrieval are also discussed.


An Uncertainty Management Calculus for Ordering Searches in Distributed Dynamic Databases

arXiv.org Artificial Intelligence

MINDS is a distributed system of cooperating query engines that customize, document retrieval for each user in a dynamic environment. It improves its performance and adapts to changing patterns of document distribution by observing system-user interactions and modifying the appropriate certainty factors, which act as search control parameters. It argued here that the uncertainty management calculus must account for temporal precedence, reliability of evidence, degree of support for a proposition, and saturation effects. The calculus presented here possesses these features. Some results obtained with this scheme are discussed.


Disease Detection and Symptom Tracking by Retrieving Information from the Web

AAAI Conferences

This paper proposes techniques for preliminary disease detection and personal symptom tracking adopting concepts and methods of web information retrieval. The proposed approaches are inspired by web users’ behavior. People look for information of symptoms from Internet. Therefore, considering information in Web pages, the developed system proposes possible diseases related to one or more queried symptoms. Moreover, these queried symptoms would be recorded in the query log so that the user could utilize these records to trace the history of symptoms, further to manage their own health or provide them to doctors as reference. As ranking detected diseases needs professional knowledge, we instead evaluate relevancy of retrieved sentences containing detected diseases in both strict and lenient metrics. Experimental results support the proposed ranking approach. The techniques described in this paper are also implemented to develop an Android application called “Health Generation”. In this application, the detected disease is further linked to its Wikipedia introduction and the nearby clinics are listed. Users can utilize the GPS function provided by cell phones to plan the route for them. Through the proposed approaches and the application to provide medical information and solutions according to users’ need and further to help users manage their health is the aim of this research.


Visualizing and Interacting with Concept Hierarchies

arXiv.org Machine Learning

Concept Hierarchies and Formal Concept Analysis are theoretically well grounded and largely experimented methods. They rely on line diagrams called Galois lattices for visualizing and analysing object-attribute sets. Galois lattices are visually seducing and conceptually rich for experts. However they present important drawbacks due to their concept oriented overall structure: analysing what they show is difficult for non experts, navigation is cumbersome, interaction is poor, and scalability is a deep bottleneck for visual interpretation even for experts. In this paper we introduce semantic probes as a means to overcome many of these problems and extend usability and application possibilities of traditional FCA visualization methods. Semantic probes are visual user centred objects which extract and organize reduced Galois sub-hierarchies. They are simpler, clearer, and they provide a better navigation support through a rich set of interaction possibilities. Since probe driven sub-hierarchies are limited to users focus, scalability is under control and interpretation is facilitated. After some successful experiments, several applications are being developed with the remaining problem of finding a compromise between simplicity and conceptual expressivity.