Collaborating Authors

Integrating Image Content and its Associated Text in a Web Image Retrieval Agent

AAAI Conferences

The World-Wide-Web, with its graphical interface and the ability to integrate different media and hyperlinks is the main responsible for this growth. Finding useful information through navigation in such a large information space has proved to be difficult (Cheong 1996). Several tools have been developed in the last couple of years to assist the search of information in WWW. These tools are often referred to as Spiders, since they build indices of information available in WWW while traversing this Web of information. The term Internet Agents is also used to refer to these information retrieval services.

Minimally Complete Recommendations

AAAI Conferences

Recent research has highlighted the benefits of completeness as a retrieval criterion in recommender systems. In complete retrieval, any subset of the constraints in a given query that can be satisfied must be satisfied by at least one of the retrieved products. Minimal completeness (i.e., always retrieving the smallest set of products needed for completeness) is also beginning to attract research interest as a way to minimize cognitive load in the approach. Other important features of a retrieval algorithm’s behavior include the diversity of the retrieved products and the order in which they are presented to the user. In this paper, we present a new algorithm for minimally complete retrieval (MCR) in which the ranking of retrieved products is primarily based on the number of constraints that they satisfy, but other measures such as similarity or utility can also be used to inform the retrieval process. We also present theoretical and empirical results that demonstrate our algorithm’s ability to minimize cognitive load while ensuring the completeness and diversity of the retrieved products.

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

Neural Information Processing Systems

This paper explores the task of interactive image retrieval using natural language queries, where a user progressively provides input queries to refine a set of retrieval results. Moreover, our work explores this problem in the context of complex image scenes containing multiple objects. We propose Drill-down, an effective framework for encoding multiple queries with an efficient compact state representation that significantly extends current methods for single-round image retrieval. We show that using multiple rounds of natural language queries as input can be surprisingly effective to find arbitrarily specific images of complex scenes. Furthermore, we find that existing image datasets with textual captions can provide a surprisingly effective form of weak supervision for this task.

Multimedia Search with Pseudo-Relevance Feedback

AAAI Conferences

We present an algorithm for video retrieval that fuses the decisions of multiple retrieval agents in both text and image modalities. While the normalization and combination of evidence is novel, this paper emphasizes the successful use of negative pseudo-relevance feedback to improve image retrieval performance. While the results are still far from perrfect, pseudo-relevance feedback shows great promise for multimedia retrieval in very noisy data. The Informedia Digital Video Library Project Video is a rich source of information, with aspects of content available both visually and acoustically. The Informedia Digital Video Library project focuses specifically on information extraction from video and audio content.

Group-Pair Convolutional Neural Networks for Multi-View Based 3D Object Retrieval

AAAI Conferences

In recent years, research interest in object retrieval has shifted from 2D towards 3D data. Despite many well-designed approaches, we point out that limitations still exist and there is tremendous room for improvement, including the heavy reliance on hand-crafted features, the separated optimization of feature extraction and object retrieval, and the lack of sufficient training samples. In this work, we address the above limitations for 3D object retrieval by developing a novel end-to-end solution named Group Pair Convolutional Neural Network (GPCNN). It can jointly learn the visual features from multiple views of a 3D model and optimize towards the object retrieval task. To tackle the insufficient training data issue, we innovatively employ a pair-wise learning scheme, which learns model parameters from the similarity of each sample pair, rather than the traditional way of learning from sparse label–sample matching. Extensive experiments on three public benchmarks show that our GPCNN solution significantly outperforms the state-of-the-art methods with 3% to 42% improvement in retrieval accuracy.