AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy

Moreira, Catarina, Wichert, Andreas

arXiv.org Artificial IntelligenceJun-12-2013

Expert finding is an information retrieval task concerned with the search for the most knowledgeable people, in some topic, with basis on documents describing peoples activities. The task involves taking a user query as input and returning a list of people sorted by their level of expertise regarding the user query. This paper introduces a novel approach for combining multiple estimators of expertise based on a multisensor data fusion framework together with the Dempster-Shafer theory of evidence and Shannon's entropy. More specifically, we defined three sensors which detect heterogeneous information derived from the textual contents, from the graph structure of the citation patterns for the community of experts, and from profile information about the academic experts. Given the evidences collected, each sensor may define different candidates as experts and consequently do not agree in a final ranking decision. To deal with these conflicts, we applied the Dempster-Shafer theory of evidence combined with Shannon's Entropy formula to fuse this information and come up with a more accurate and reliable final ranking list. Experiments made over two datasets of academic publications from the Computer Science domain attest for the adequacy of the proposed approach over the traditional state of the art approaches. We also made experiments against representative supervised state of the art algorithms. Results revealed that the proposed method achieved a similar performance when compared to these supervised techniques, confirming the capabilities of the proposed framework.

artificial intelligence, information retrieval, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.eswa.2013.04.001

1306.2864

Country:

North America > United States (0.28)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.68)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)

Add feedback

SHARE: A Web Service Based Framework for Distributed Querying and Reasoning on the Semantic Web

Vandervalk, Ben P, McCarthy, E Luke, Wilkinson, Mark D

arXiv.org Artificial IntelligenceMay-20-2013

Here we describe the SHARE system, a web service based framework for distributed querying and reasoning on the semantic web. The main innovations of SHARE are: (1) the extension of a SPARQL query engine to perform on-demand data retrieval from web services, and (2) the extension of an OWL reasoner to test property restrictions by means of web service invocations. In addition to enabling queries across distributed datasets, the system allows for a target dataset that is significantly larger than is possible under current, centralized approaches. Although the architecture is equally applicable to all types of data, the SHARE system targets bioinformatics, due to the large number of interoperable web services that are already available in this area. SHARE is built entirely on semantic web standards, and is the successor of the BioMOBY project.

artificial intelligence, information retrieval query processing, natural language, (19 more...)

arXiv.org Artificial Intelligence

1305.4455

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)

Add feedback

Towards Finding Relevant Information Graphics: Identifying the Independent and Dependent Axis from User-Written Queries

Li, Zhuo (University of Delaware) | Stagitis, Matthew (University of Delaware) | McCoy, Kathleen (University of Delaware) | Carberry, Sandra (University of Delaware)

AAAI ConferencesMay-19-2013

Information graphics (non-pictorial graphics such as bar charts and line graphs) contain a great deal of knowledge. Information retrieval research has focused on retrieving textual documents and on extracting images based on words appearing in the accompanying article or based on low-level features such as color or texture. Our goal is to build a system for retrieving information graphics that reasons about the content of the graphic itself in deciding its relevance to the user query. As a first step, we aim to identify, from a full sentence user query, what should be depicted on the independent and dependent axes of potentially relevant graphs. Natural language processing techniques are used to extract features from the query and machine learning is employed to build a model for hypothesizing the content of the axes. Results have shown that our models can achieve accuracy higher than 80% on a corpus of collected user queries.

independent and dependent axis, relevant information graphic, user-written query, (1 more...)

AAAI Conferences

The Twenty-Sixth International FLAIRS Conference

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)

Add feedback

Automated Non-Content Word List Generation Using hLDA

Krug, Wayne (Language Computer Corporation) | Tomlinson, Marc T. (Language Computer Corporation)

AAAI ConferencesMay-19-2013

In this paper, we present a language-independent method for the automatic, unsupervised extraction of non-content words from a corpus of documents. This method permits the creation of word lists that may be used in place of traditional function word lists in various natural language processing tasks. As an example we generated lists of words from a corpus of English, Chinese, and Russian posts extracted from Wikipedia articles and Wikipedia Wikitalk discussion pages. We applied these lists to the task of authorship attribution on this corpus to compare the effectiveness of lists of words extracted with this method to expert-created function word lists and frequent word lists (a common alternative to function word lists). hLDA lists perform comparably to frequent word lists. The trials also show that corpus-derived lists tend to perform better than more generic lists, and both sets of generated lists significantly outperformed the expert lists. Additionally, we evaluated the performance of an English expert list on machine translations of our Chinese and Russian documents, showing that our method also outperforms this alternative.

automated non-content word list generation, hlda

AAAI Conferences

The Twenty-Sixth International FLAIRS Conference

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.60)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.53)

Add feedback

Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization

Wang, Jing-Yan, AbdulJabbar, Mustafa

arXiv.org Machine LearningApr-3-2013

Nonnegative Matrix Factorization (NMF) has been continuously evolving in several areas like pattern recognition and information retrieval methods. It factorizes a matrix into a product of 2 low-rank non-negative matrices that will define parts-based, and linear representation of nonnegative data. Recently, Graph regularized NMF (GrNMF) is proposed to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In GNMF, an affinity graph is constructed from the original data space to encode the geometrical information. In this paper, we propose a novel idea which engages a Multiple Kernel Learning approach into refining the graph structure that reflects the factorization of the matrix and the new data space. The GrNMF is improved by utilizing the graph refined by the kernel learning, and then a novel kernel learning method is introduced under the GrNMF framework. Our approach shows encouraging results of the proposed algorithm in comparison to the state-of-the-art clustering algorithms like NMF, GrNMF, SVD etc.

information retrieval, machine learning, natural language, (4 more...)

arXiv.org Machine Learning

1208.3845

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.53)

Add feedback

An Architecture for Probabilistic Concept-Based Information Retrieval

Fung, Robert, Crawford, S. L., Appelbaum, Lee A., Tong, Richard M.

arXiv.org Artificial IntelligenceMar-27-2013

While concept-based methods for information retrieval can provide improved performance over more conventional techniques, they require large amounts of effort to acquire the concepts and their qualitative and quantitative relationships. This paper discusses an architecture for probabilistic concept-based information retrieval which addresses the knowledge acquisition problem. The architecture makes use of the probabilistic networks technology for representing and reasoning about concepts and includes a knowledge acquisition component which partially automates the construction of concept knowledge bases from data. We describe two experiments that apply the architecture to the task of retrieving documents about terrorism from a set of documents from the Reuters news service. The experiments provide positive evidence that the architecture design is feasible and that there are advantages to concept-based methods.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1304.1128

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Michigan > Wayne County > Detroit (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.34)

Industry:

Law Enforcement & Public Safety > Terrorism (0.60)
Media > News (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

Machine Learning, Clustering, and Polymorphy

Hanson, Stephen Jose, Bauer, Malcolm

arXiv.org Artificial IntelligenceMar-27-2013

This paper describes a machine induction program (WITT) that attempts to model human categorization. Properties of categories to which human subjects are sensitive includes best or prototypical members, relative contrasts between putative categories, and polymorphy (neither necessary or sufficient features). This approach represents an alternative to usual Artificial Intelligence approaches to generalization and conceptual clustering which tend to focus on necessary and sufficient feature rules, equivalence classes, and simple search and match schemes. WITT is shown to be more consistent with human categorization while potentially including results produced by more traditional clustering schemes. Applications of this approach in the domains of expert systems and information retrieval are also discussed.

category, machine learning, simulation of human behavior, (18 more...)

arXiv.org Artificial Intelligence

1304.3432

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Africa > Zambia > Southern Province > Choma (0.04)

Genre: Research Report (0.64)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.36)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.34)

Add feedback

An Uncertainty Management Calculus for Ordering Searches in Distributed Dynamic Databases

Mukhopadhyay, Uttam

arXiv.org Artificial IntelligenceMar-27-2013

MINDS is a distributed system of cooperating query engines that customize, document retrieval for each user in a dynamic environment. It improves its performance and adapts to changing patterns of document distribution by observing system-user interactions and modifying the appropriate certainty factors, which act as search control parameters. It argued here that the uncertainty management calculus must account for temporal precedence, reliability of evidence, degree of support for a proposition, and saturation effects. The calculus presented here possesses these features. Some results obtained with this scheme are discussed.

data mining, information retrieval, natural language, (17 more...)

arXiv.org Artificial Intelligence

1304.31

Country:

North America > United States > South Carolina > Richland County > Columbia (0.05)
North America > United States > Michigan > Macomb County > Warren (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.57)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Disease Detection and Symptom Tracking by Retrieving Information from the Web

Ku, Lun-Wei (Academia Sinica) | Li, Wan-Lun (National Yunlin University of Science and Technology) | Chang, Ting-Chih (National Yunlin University of Science and Technology)

AAAI ConferencesMar-21-2013

This paper proposes techniques for preliminary disease detection and personal symptom tracking adopting concepts and methods of web information retrieval. The proposed approaches are inspired by web users’ behavior. People look for information of symptoms from Internet. Therefore, considering information in Web pages, the developed system proposes possible diseases related to one or more queried symptoms. Moreover, these queried symptoms would be recorded in the query log so that the user could utilize these records to trace the history of symptoms, further to manage their own health or provide them to doctors as reference. As ranking detected diseases needs professional knowledge, we instead evaluate relevancy of retrieved sentences containing detected diseases in both strict and lenient metrics. Experimental results support the proposed ranking approach. The techniques described in this paper are also implemented to develop an Android application called “Health Generation”. In this application, the detected disease is further linked to its Wikipedia introduction and the nearby clinics are listed. Users can utilize the GPS function provided by cell phones to plan the route for them. Through the proposed approaches and the application to provide medical information and solutions according to users’ need and further to help users manage their health is the aim of this research.

artificial intelligence, information retrieval, natural language, (2 more...)

AAAI Conferences

2013 AAAI Spring Symposium Series

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Communications > Web (0.53)
Information Technology > Communications > Mobile (0.53)

Add feedback

Visualizing and Interacting with Concept Hierarchies

Crampes, Michel, Plantié, Michel

arXiv.org Machine LearningMar-11-2013

Concept Hierarchies and Formal Concept Analysis are theoretically well grounded and largely experimented methods. They rely on line diagrams called Galois lattices for visualizing and analysing object-attribute sets. Galois lattices are visually seducing and conceptually rich for experts. However they present important drawbacks due to their concept oriented overall structure: analysing what they show is difficult for non experts, navigation is cumbersome, interaction is poor, and scalability is a deep bottleneck for visual interpretation even for experts. In this paper we introduce semantic probes as a means to overcome many of these problems and extend usability and application possibilities of traditional FCA visualization methods. Semantic probes are visual user centred objects which extract and organize reduced Galois sub-hierarchies. They are simpler, clearer, and they provide a better navigation support through a rich set of interaction possibilities. Since probe driven sub-hierarchies are limited to users focus, scalability is under control and interpretation is facilitated. After some successful experiments, several applications are being developed with the remaining problem of finding a compromise between simplicity and conceptual expressivity.

artificial intelligence, information retrieval, natural language, (19 more...)

arXiv.org Machine Learning

1303.2488

Country:

North America > United States (0.68)
Europe (0.67)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.84)

Add feedback