AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

An Information Retrieval Approach to Finding Dependent Subspaces of Multiple Views

Lin, Ziyuan, Peltonen, Jaakko

arXiv.org Machine LearningJan-7-2016

Finding relationships between multiple views of data is essential both for exploratory analysis and as pre-processing for predictive tasks. A prominent approach is to apply variants of Canonical Correlation Analysis (CCA), a classical method seeking correlated components between views. The basic CCA is restricted to maximizing a simple dependency criterion, correlation, measured directly between data coordinates. We introduce a new method that finds dependent subspaces of views directly optimized for the data analysis task of \textit{neighbor retrieval between multiple views}. We optimize mappings for each view such as linear transformations to maximize cross-view similarity between neighborhoods of data samples. The criterion arises directly from the well-defined retrieval task, detects nonlinear and local similarities, is able to measure dependency of data relationships rather than only individual data coordinates, and is related to well understood measures of information retrieval quality. In experiments we show the proposed method outperforms alternatives in preserving cross-view neighborhood similarities, and yields insights into local dependencies between multiple views.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1511.06423

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.62)

Add feedback

What Do You Need to Know to Use a Search Engine? Why We Still Need to Teach Research Skills

Russell, Daniel M. (Google)

AI MagazineDec-31-2015

For the vast majority of queries (for example, navigation, simple fact lookup, and others), search engines do extremely well. They also highlight many of the issues that are common to sophisticated AI question-answering systems. Rapid and ready-at-hand access, depth of processing, and the way they enable people to offload some ordinary memory tasks suggest that search engines have become more of a cognitive amplifier than a simple repository or front-end to the Internet. Although search engines are superb at finding and presenting information--up to and including extracting complex relations and making simple inferences--knowing how to frame questions and evaluate their results for accuracy and credibility remains an ongoing challenge.

artificial intelligence, information management, search engine, (6 more...)

AI Magazine

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

What Do You Need to Know to Use a Search Engine? Why We Still Need to Teach Research Skills

Russell, Daniel M. (Google)

AI MagazineDec-31-2015

For the vast majority of queries (for example, navigation, simple fact lookup, and others), search engines do extremely well. Their ability to quickly provide answers to queries is a remarkable testament to the power of many of the fundamental methods of AI. They also highlight many of the issues that are common to sophisticated AI question-answering systems. It has become clear that people think of search programs in ways that are very different from traditional information sources. Rapid and ready-at-hand access, depth of processing, and the way they enable people to offload some ordinary memory tasks suggest that search engines have become more of a cognitive amplifier than a simple repository or front-end to the Internet. Like all sophisticated tools, people still need to learn how to use them. Although search engines are superb at finding and presenting information—up to and including extracting complex relations and making simple inferences—knowing how to frame questions and evaluate their results for accuracy and credibility remains an ongoing challenge. Some questions are still deep and complex, and still require knowledge on the part of the search user to work through to a successful answer. And the fact that the underlying information content, user interfaces, and capabilities are all in a continual state of change means that searchers need to continually update their knowledge of what these programs can (and cannot) do.

information retrieval, natural language, question answering, (17 more...)

AI Magazine

Country:

Europe > Germany (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.05)
(14 more...)

Genre:

Research Report (0.67)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Health & Medicine (0.68)
Education > Educational Setting > K-12 Education (0.46)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.86)

Add feedback

An Active Learning Framework using Sparse-Graph Codes for Sparse Polynomials and Graph Sketching

Li, Xiao, Ramchandran, Kannan

Neural Information Processing SystemsDec-31-2015

Let $f: \{-1,1\}^n \rightarrow \mathbb{R}$ be an $n$-variate polynomial consisting of $2^n$ monomials, in which only $s\ll 2^n$ coefficients are non-zero. The goal is to learn the polynomial by querying the values of $f$. We introduce an active learning framework that is associated with a low query cost and computational runtime. The significant savings are enabled by leveraging sampling strategies based on modern coding theory, specifically, the design and analysis of {\it sparse-graph codes}, such as Low-Density-Parity-Check (LDPC) codes, which represent the state-of-the-art of modern packet communications. More significantly, we show how this design perspective leads to exciting, and to the best of our knowledge, largely unexplored intellectual connections between learning and coding. The key is to relax the worst-case assumption with an ensemble-average setting, where the polynomial is assumed to be drawn uniformly at random from the ensemble of all polynomials (of a given size $n$ and sparsity $s$). Our framework succeeds with high probability with respect to the polynomial ensemble with sparsity up to $s={O}(2^{\delta n})$ for any $\delta\in(0,1)$, where $f$ is exactly learned using ${O}(ns)$ queries in time ${O}(n s \log s)$, even if the queries are perturbed by Gaussian noise. We further apply the proposed framework to graph sketching, which is the problem of inferring sparse graphs by querying graph cuts. By writing the cut function as a polynomial and exploiting the graph structure, we propose a sketching algorithm to learn the an arbitrary $n$-node unknown graph using only few cut queries, which scales {\it almost linearly} in the number of edges and {\it sub-linearly} in the graph size $n$. Experiments on real datasets show significant reductions in the runtime and query complexity compared with competitive schemes.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)

Add feedback

Copeland Dueling Bandits

Zoghi, Masrour, Karnin, Zohar S., Whiteson, Shimon, Rijke, Maarten de

Neural Information Processing SystemsDec-31-2015

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form O(K log T) but require restrictive assumptions, or offer bounds of the form O(K^2 log T) without requiring such assumptions. Our results offer the best of both worlds: O(K log T) bounds without restrictive assumptions.

data mining, information retrieval, machine learning, (21 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Data Science > Data Mining > Big Data (0.54)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Information retrieval in folktales using natural language processing

Groza, Adrian, Corde, Lidia

arXiv.org Artificial IntelligenceNov-10-2015

Recognising literary characters in various narrative texts is challenging both from the literary and technical perspective. From the literary viewpoint, the meaning of the term "character" leaves space to various interpretations. From the technical perspective, literary texts contain a lot of data about emotions, social life or inner life of the characters, while they are very thin on technical, straightforward messages. To infer the character type from literary texts might pose problems even to the human readers [4]. Interactions between literary characters contain rich social networks.

artificial intelligence, information retrieval, natural language, (18 more...)

arXiv.org Artificial Intelligence

1511.03012

Country: Europe > Romania (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.52)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.51)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.47)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Add feedback

CiteSeerX: AI in a Digital Library Search Engine

AI MagazineSep-28-2015

CiteSeerX is a digital library search engine providing access to more than five million scholarly documents with nearly a million users and millions of hits per day. These AI technologies have been developed by CiteSeerX group members over the past 5–6 years. We also present AI technologies implemented in table and algorithm search, which are special search modes in CiteSeerX. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.

artificial intelligence, digital library search engine, information management, (6 more...)

AI Magazine

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Information Management > Search (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.95)

Add feedback

An End-to-End Conversational Second Screen Application for TV Program Discovery

AI MagazineSep-28-2015

In this article, we report on a multiphase R&D effort to develop a conversational second screen application for TV program discovery. Our goal is to share with the community the breadth of artificial intelligence (AI) and natural language (NL) technologies required to develop such an application along with learnings from target end-users. We first give an overview of our application from the perspective of the end-user. We then present the architecture of our application along with the main AI and NL components, which were developed over multiple phases. The first phase focuses on enabling core functionality such as effectively finding programs matching the user’s intent. The second phase focuses on enabling dialog with the user. Finally, we present two user studies, corresponding to these two phases. The results from both studies demonstrate the effectiveness of our application in the target domain.

artificial intelligence, machine learning, natural language, (21 more...)

AI Magazine

Country:

Europe (0.93)
North America > United States > California (0.68)

Genre:

Overview (1.00)
Questionnaire & Opinion Survey (0.69)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology (0.93)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
(8 more...)

Add feedback

A Practioner's Guide to Evaluating Entity Resolution Results

Barnes, Matt

arXiv.org Machine LearningSep-14-2015

Entity resolution (ER) is the task of identifying records belonging to the same entity (e.g. individual, group) across one or multiple databases. Ironically, it has multiple names: deduplication and record linkage, among others. In this paper we survey metrics used to evaluate ER results in order to iteratively improve performance and guarantee sufficient quality prior to deployment. Some of these metrics are borrowed from multi-class classification and clustering domains, though some key differences exist differentiating entity resolution from general clustering. Menestrina et al. empirically showed rankings from these metrics often conflict with each other, thus our primary motivation for studying them. This paper provides practitioners the basic knowledge to begin evaluating their entity resolution results.

information retrieval, machine learning, natural language, (14 more...)

arXiv.org Machine Learning

1509.04238

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Performance Bounds for Pairwise Entity Resolution

Barnes, Matt, Miller, Kyle, Dubrawski, Artur

arXiv.org Machine LearningSep-10-2015

One significant challenge to scaling entity resolution algorithms to massive datasets is understanding how performance changes after moving beyond the realm of small, manually labeled reference datasets. Unlike traditional machine learning tasks, when an entity resolution algorithm performs well on small hold-out datasets, there is no guarantee this performance holds on larger hold-out datasets. We prove simple bounding properties between the performance of a match function on a small validation set and the performance of a pairwise entity resolution algorithm on arbitrarily sized datasets. Thus, our approach enables optimization of pairwise entity resolution algorithms for large datasets, using a small set of labeled data.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1509.03302

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback