AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps

Millar, Jeremy R. (Air Force Institute of Technology) | Peterson, Gilbert L. (Air Force Institute of Technology) | Mendenhall, Michael J. (Air Force Institute of Technology)

AAAI ConferencesMay-21-2009

Clustering and visualization of large text document collections aids in browsing, navigation, and information retrieval. We present a document clustering and visualization method based on Latent Dirichlet Allocation and self-organizing maps (LDA-SOM). LDA-SOM clusters documents based on topical content and renders clusters in an intuitive two-dimensional format. Document topics are inferred using a probabilistic topic model. Then, due to the topology preserving properties of self-organizing maps, document clusters with similar topic distributions are placed near one another in the visualization. This provides the user an intuitive means of browsing from one cluster to another based on topics held in common. The effectiveness of LDA-SOM is evaluated on the 20 Newsgroups and NIPS data sets.

document collection, topic distribution, vector, (15 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

Asia > Middle East > Jordan (0.06)
North America > United States > New York (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
(3 more...)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Modeling Semantic Question Context for Question Answering

Banerjee, Protima (Drexel University) | Han, Hyoil (Drexel University)

AAAI ConferencesMay-21-2009

Within a Question Answering (QA) framework, Question Context plays a vital role. We define Question Context to be background knowledge that can be used to represent the user’s information need more completely than the terms in the query alone. This paper proposes a novel approach that uses statistical language modeling techniques to develop a semantic Question Context which we then incorporate into the Information Retrieval (IR) stage of QA. Our approach proposes an Aspect-Based Relevance Language Model as basis of the Question Context Model. This model proposes that the sparse vocabulary of a query can be supplemented with semantic information from concepts (or aspects) related to query terms that already exist within the corpus. We incorporate the Aspect-Based Relevance Language Model into Question Context by first obtaining all of the latent concepts that exist in the corpus for a particular question topic. Then, we derive a likelihood of relevance that relates each Context Term (CT) associated with those aspects to the user’s query. Context Terms from the topics with the highest likelihood of relevance are then incorporated into the query language model based on their relevance score values. We use both query expansion and document model smoothing techniques and evaluate our approach using the traditional recall metric. Our results are promising and show significant improvements recall at low levels of precision using the query expansion method.

information, query, question context, (14 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

North America > United States > Virginia > Fairfax County > McLean (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Napa County (0.04)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.77)

Add feedback

Mining Meaning from Wikipedia

Medelyan, Olena, Milne, David, Legg, Catherine, Witten, Ian H.

arXiv.org Artificial IntelligenceMay-9-2009

Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.

relation, wikipedia, wikipedia article, (17 more...)

arXiv.org Artificial Intelligence

0809.4530

Country:

North America > United States > Texas (0.14)
North America > Canada > Ontario > Middlesex County > London (0.14)
Oceania > New Zealand > North Island > Waikato (0.04)
(31 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Media > Film (0.92)
Leisure & Entertainment > Sports (0.92)
(5 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(7 more...)

Add feedback

Switcher-random-walks: a cognitive-inspired mechanism for network exploration

Goñi, Joaquín, Martincorena, Iñigo, Corominas-Murtra, Bernat, Arrondo, Gonzalo, Ardanza-Trevijano, Sergio, Villoslada, Pablo

arXiv.org Artificial IntelligenceMar-24-2009

Semantic memory is the subsystem of human memory that stores knowledge of concepts or meanings, as opposed to life specific experiences. The organization of concepts within semantic memory can be understood as a semantic network, where the concepts (nodes) are associated (linked) to others depending on perceptions, similarities, etc. Lexical access is the complementary part of this system and allows the retrieval of such organized knowledge. While conceptual information is stored under certain underlying organization (and thus gives rise to a specific topology), it is crucial to have an accurate access to any of the information units, e.g. the concepts, for efficiently retrieving semantic information for real-time needings. An example of an information retrieval process occurs in verbal fluency tasks, and it is known to involve two different mechanisms: -clustering-, or generating words within a subcategory, and, when a subcategory is exhausted, -switching- to a new subcategory. We extended this approach to random-walking on a network (clustering) in combination to jumping (switching) to any node with certain probability and derived its analytical expression based on Markov chains. Results show that this dual mechanism contributes to optimize the exploration of different network models in terms of the mean first passage time. Additionally, this cognitive inspired dual mechanism opens a new framework to better understand and evaluate exploration, propagation and transport phenomena in other complex systems where switching-like phenomena are feasible.

information retrieval, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1142/S0218127410026204

0903.4132

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain > Navarre > Pamplona (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)
(2 more...)

Add feedback

Unsupervised Methods for Determining Object and Relation Synonyms on the Web

Yates, A., Etzioni, O.

Journal of Artificial Intelligence ResearchMar-24-2009

The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called Resolver , introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, Resolver resolves objects with 78% precision and 68% recall, and resolves relations with 90% precision and 35% recall. Several variations of resolver's probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic Resolver system allows it to handle polysemous names with 97% precision and 95% recall on a data set from the TREC corpus.

extraction, relation, resolver, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2772

AI Access Foundation

10591

Journal of Artificial Intelligence Research

Country:

North America > United States > Virginia (0.04)
North America > United States > West Virginia (0.04)
North America > United States > District of Columbia > Washington (0.04)
(5 more...)

Genre:

Overview (0.85)
Research Report > New Finding (0.67)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(4 more...)

Add feedback

Symbolic Computing with Incremental Mindmaps to Manage and Mine Data Streams - Some Applications

Brucks, Claudine, Hilker, Michael, Schommer, Christoph, Wagner, Cynthia, Weires, Ralph

arXiv.org Artificial IntelligenceFeb-18-2009

In our understanding, a mind-map is an adaptive engine that basically works incrementally on the fundament of existing transactional streams. Generally, mind-maps consist of symbolic cells that are connected with each other and that become either stronger or weaker depending on the transactional stream. Based on the underlying biologic principle, these symbolic cells and their connections as well may adaptively survive or die, forming different cell agglomerates of arbitrary size. In this work, we intend to prove mind-maps' eligibility following diverse application scenarios, for example being an underlying management system to represent normal and abnormal traffic behaviour in computer networks, supporting the detection of the user behaviour within search engines, or being a hidden communication layer for natural language interaction.

data mining, information retrieval, machine learning, (20 more...)

arXiv.org Artificial Intelligence

0902.3196

Country:

Oceania > Australia > Tasmania > Hobart (0.04)
North America > United States > New Mexico (0.04)
North America > United States > Massachusetts (0.04)
(6 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

A General Boosting Method and its Application to Learning Ranking Functions for Web Search

Zheng, Zhaohui, Zha, Hongyuan, Zhang, Tong, Chapelle, Olivier, Chen, Keke, Sun, Gordon

Neural Information Processing SystemsDec-31-2008

We present a general boosting method extending functional gradient boosting to optimize complex loss functions that are encountered in many machine learning problems. Our approach is based on optimization of quadratic upper bounds of the loss functions which allows us to present a rigorous convergence analysis of the algorithm. More importantly, this general framework enables us to use a standard regression base learner such as decision trees for fitting any loss function. We illustrate an application of the proposed method in learning ranking functions for Web search by combining both preference data and labeled data for training. We present experimental results for Web search using data from a commercial search engine that show significant improvements of our proposed methods over some existing methods.

information retrieval, machine learning, preference data, (21 more...)

Neural Information Processing Systems

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.34)

Add feedback

Mining Internet-Scale Software Repositories

Linstead, Erik, Rigor, Paul, Bajracharya, Sushil, Lopes, Cristina, Baldi, Pierre F.

Neural Information Processing SystemsDec-31-2008

Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop an infrastructure for the automated crawling, parsing, and database storage of open source software. The infrastructure allows us to gather Internet-scale source code. For instance, in one experiment, we gather 4,632 java projects from SourceForge and Apache totaling over 38 million lines of code from 9,250 developers. Simple statistical analyses of the data first reveal robust power-law behavior for package, SLOC, and method call distributions. We then develop and apply unsupervised author-topic, probabilistic models to automatically discover the topics embedded in the code and extract topic-word and author-topic distributions. In addition to serving as a convenient summary for program function and developer activities, these and other related distributions provide a statistical and information-theoretic basis for quantifying and analyzing developer similarity and competence, topic scattering, and document tangling, with direct applications to software engineering. Finally, by combining software textual content with structural information captured by our CodeRank approach, we are able to significantly improve software retrieval performance, increasing the AUC metric to 0.86-- roughly 10-30% better than previous approaches based on text alone.

information retrieval, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

A learning framework for nearest neighbor search

Cayton, Lawrence, Dasgupta, Sanjoy

Neural Information Processing SystemsDec-31-2008

Can we leverage learning techniques to build a fast nearest-neighbor (NN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures.

information retrieval, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre: Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.41)

Add feedback

Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

Carterette, Ben, Jones, Rosie

Neural Information Processing SystemsDec-31-2008

We propose a model that leverages the millions of clicks received by web search engines, to predict document relevance. This allows the comparison of ranking functions when clicks are available but complete relevance judgments are not. After an initial training phase using a set of relevance judgments paired with click data, we show that our model can predict the relevance score of documents that have not been judged. These predictions can be used to evaluate the performance of a search engine, using our novel formalization of the confidence of the standard evaluation metric discounted cumulative gain (DCG), so comparisons can be made across time and datasets. This contrasts with previous methods which can provide only pair-wise relevance judgements between results shown for the same query. When no relevance judgments are available, we can identify the better of two ranked lists up to 82% of the time, and with only two relevance judgments for each query, we can identify the better ranking up to 94% of the time. While our experiments are on sponsored search results, which is the financial backbone of web search, our method is general enough to be applicable to algorithmic web search results as well. Furthermore, we give an algorithm to guide the selection of additional documents to judge to improve confidence.

artificial intelligence, information retrieval, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.92)

Add feedback