AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Query Complexity of Derivative-Free Optimization

Jamieson, Kevin G., Nowak, Robert D., Recht, Benjamin

arXiv.org Machine LearningSep-11-2012

This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1209.2434

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.40)

Add feedback

Comparative Study and Optimization of Feature-Extraction Techniques for Content based Image Retrieval

Chadha, Aman, Mallik, Sushmit, Johar, Ravdeep

arXiv.org Artificial IntelligenceAug-30-2012

The aim of a Content-Based Image Retrieval (CBIR) system, also known as Query by Image Content (QBIC), is to help users to retrieve relevant images based on their contents. CBIR technologies provide a method to find images in large databases by using unique descriptors from a trained image. The image descriptors include texture, color, intensity and shape of the object inside an image. Several feature-extraction techniques viz., Average RGB, Color Moments, Co-occurrence, Local Color Histogram, Global Color Histogram and Geometric Moment have been critically compared in this paper. However, individually these techniques result in poor performance. So, combinations of these techniques have also been evaluated and results for the most efficient combination of techniques have been presented and optimized for each class of image query. We also propose an improvement in image retrieval performance by introducing the idea of Query modification through image cropping. It enables the user to identify a region of interest and modify the initial query to refine and personalize the image retrieval results.

color histogram, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.5120/8320-1959

1208.6335

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > North Carolina (0.04)
Asia > India > Maharashtra > Mumbai (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining > Feature Extraction (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Document Clustering Evaluation: Divergence from a Random Baseline

De Vries, Christopher M., Geva, Shlomo, Trotman, Andrew

arXiv.org Artificial IntelligenceAug-29-2012

Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.

category, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1208.5654

Country:

Oceania > New Zealand > South Island > Otago > Dunedin (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

Active Learning

Settles, Burr

Morgan & Claypool PublishersAug-7-2012

The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns. An active learner may pose "queries," usually in the form of unlabeled data instances to be labeled by an "oracle" (e.g., a human annotator) that already understands the nature of the problem. This sort of approach is well-motivated in many modern machine learning and data mining applications, where unlabeled data may be abundant or easy to come by, but training labels are difficult, time-consuming, or expensive to obtain. This book is a general introduction to active learning. It outlines several scenarios in which queries might be formulated, and details many query selection algorithms which have been organized into four broad categories, or "query selection frameworks."

machine learning, natural language, top description table, (6 more...)

Morgan & Claypool Publishers

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.59)

Add feedback

Random Projection with Filtering for Nearly Duplicate Search

Lin, Yue (Zhejiang University) | Jin, Rong (Michigan State University) | Cai, Deng (Zhejiang University) | He, Xiaofei (Zhejiang University)

AAAI ConferencesJul-21-2012

High dimensional nearest neighbor search is a fundamental problem and has found applications in many domains. Although many hashing based approaches have been proposed for approximate nearest neighbor search in high dimensional space, one main drawback is that they often return many false positives that need to be filtered out by a post procedure. We propose a novel method to address this limitation in this paper. The key idea is to introduce a filtering procedure within the search algorithm, based on the compressed sensing theory, that effectively removes the false positive answers. We first obtain a sparse representation for each data point by the landmark based approach, after which we solve the nearly duplicate search that the difference between the query and its nearest neighbors forms a sparse vector living in a small ℓp ball, where p ≤ 1. Our empirical study on real-world datasets demonstrates the effectiveness of the proposed approach compared to the state-of-the-art hashing methods.

algorithm, representation, sparse representation, (13 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.05)
North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
(2 more...)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)

Add feedback

Fine-Grained Entity Recognition

Ling, Xiao (University of Washington) | Weld, Daniel S. (University of Washington)

AAAI ConferencesJul-21-2012

Entity Recognition (ER) is a key component of relation extraction systems and many other natural-language processing applications. Unfortunately, most ER systems are restricted to produce labels from to a small set of entity classes, e.g., person, organization, location or miscellaneous. In order to intelligently understand text and extract a wide range of information, it is useful to more precisely determine the semantic classes of entities mentioned in unstructured text. This paper defines a fine-grained set of 112 tags, formulates the tagging problem as multi-class, multi-label classification, describes an unsupervised method for collecting training data, and presents the FIGER implementation. Experiments show that the system accurately predicts the tags for entities. Moreover, it provides useful information for a relation extraction system, increasing the F1 score by 93%. We make FIGER and its data available as a resource for future work.

extraction, iger, proceedings, (14 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Illinois (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.46)
Government (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)

Add feedback

Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization

Nie, Feiping (University of Texas, Arlington) | Huang, Heng (University of Texas, Arlington) | Ding, Chris (University of Texas, Arlington)

AAAI ConferencesJul-21-2012

As an emerging machine learning and information retrieval technique, the matrix completion has been successfully applied to solve many scientific applications, such as collaborative prediction in information retrieval, video completion in computer vision, \emph{etc}. The matrix completion is to recover a low-rank matrix with a fraction of its entries arbitrarily corrupted. Instead of solving the popularly used trace norm or nuclear norm based objective, we directly minimize the original formulations of trace norm and rank norm. We propose a novel Schatten $p$-Norm optimization framework that unifies different norm formulations. An efficient algorithm is derived to solve the new objective and followed by the rigorous theoretical proof on the convergence. The previous main solution strategy for this problem requires computing singular value decompositions - a task that requires increasingly cost as matrix sizes and rank increase. Our algorithm has closed form solution in each iteration, hence it converges fast. As a consequence, our algorithm has the capacity of solving large-scale matrix completion problems. Empirical studies on the recommendation system data sets demonstrate the promising performance of our new optimization framework and efficient algorithm.

information retrieval, machine learning, natural language, (17 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas > Tarrant County > Arlington (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)

Add feedback

Improving Twitter Retrieval by Exploiting Structural Information

Luo, Zhunchen (National University of Defense Technology) | Osborne, Miles (The University of Edinburgh) | ́, Saša Petrovic (The University of Edinburgh) | Wang, Ting (National University of Defense Technology)

AAAI ConferencesJul-21-2012

Most Twitter search systems generally treat a tweet as a plain text when modeling relevance. However, a series of conventions allows users to tweet in structural ways using combination of different blocks of texts.These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured document (e.g., web pages) retrieval. In this paper we utilize the structure of tweets, induced by these blocks, for Twitter retrieval. A set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring tweets can achieve state-of-the-art performance. Our approach does not rely upon social media features, but when we do add this additional information, performance improves significantly.

information retrieval, natural language, tweet, (13 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom (0.04)
Asia > Middle East > Iraq (0.04)
Asia > China > Hunan Province (0.04)

Genre: Research Report > New Finding (0.94)

Industry: Information Technology > Services (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Double-Bit Quantization for Hashing

Kong, Weihao (Shanghai Jiao Tong University) | Li, Wu-Jun (Shanghai Jiao Tong University)

AAAI ConferencesJul-21-2012

Hashing, which tries to learn similarity-preserving binary codes for data representation, has been widely used for efficient nearest neighbor search in massive databases due to its fast query speed and low storage cost. Because it is NP hard to directly compute the best binary codes for a given data set, mainstream hashing methods typically adopt a two-stage strategy. In the first stage, several projected dimensions of real values are generated. Then in the second stage, the real values will be quantized into binary codes by thresholding. Currently, most existing methods use one single bit to quantize each projected dimension. One problem with this single-bit quantization (SBQ) is that the threshold typically lies in the region of the highest point density and consequently a lot of neighboring points close to the threshold will be hashed to totally different bits, which is unexpected according to the principle of hashing. In this paper, we propose a novel quantization strategy, called double-bit quantization (DBQ), to solve the problem of SBQ. The basic idea of DBQ is to quantize each projected dimension into double bits with adaptively learned thresholds. Extensive experiments on two real data sets show that our DBQ strategy can significantly outperform traditional SBQ strategy for hashing.

information retrieval, machine learning, natural language, (21 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Shanghai > Shanghai (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Building Contextual Anchor Text Representation using Graph Regularization

Dai, Na (Lehigh University)

AAAI ConferencesJul-21-2012

Anchor texts are useful complementary description for target pages, widely applied to improve search relevance. The benefits come from the additional information introduced into document representation and the intelligent ways of estimating their relative importance. Previous work on anchor importance estimation treated anchor text independently without considering its context. As a result, the lack of constraints from such context fails to guarantee a stable anchor text representation. We propose an anchor graph regularization approach to incorporate constraints from such context into anchor text weighting process, casting the task into a convex quadratic optimization problem. The constraints draw from the estimation of anchor-anchor, anchor-page, and page-page similarity. Based on any estimators, our approach operates as a post process of refining the estimated anchor weights, making it a plug and play component in search infrastructure. Comparable experiments on standard data sets (TREC 2009 and 2010) demonstrate the efficacy of our approach.

information retrieval, machine learning, natural language, (16 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Middle East > Lebanon (0.04)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Web (0.95)
Information Technology > Information Management (0.94)
(2 more...)

Add feedback