Goto

Collaborating Authors

 Asia


Double-Bit Quantization for Hashing

AAAI Conferences

Hashing, which tries to learn similarity-preserving binary codes for data representation, has been widely used for efficient nearest neighbor search in massive databases due to its fast query speed and low storage cost. Because it is NP hard to directly compute the best binary codes for a given data set, mainstream hashing methods typically adopt a two-stage strategy. In the first stage, several projected dimensions of real values are generated. Then in the second stage, the real values will be quantized into binary codes by thresholding. Currently, most existing methods use one single bit to quantize each projected dimension. One problem with this single-bit quantization (SBQ) is that the threshold typically lies in the region of the highest point density and consequently a lot of neighboring points close to the threshold will be hashed to totally different bits, which is unexpected according to the principle of hashing. In this paper, we propose a novel quantization strategy, called double-bit quantization (DBQ), to solve the problem of SBQ. The basic idea of DBQ is to quantize each projected dimension into double bits with adaptively learned thresholds. Extensive experiments on two real data sets show that our DBQ strategy can significantly outperform traditional SBQ strategy for hashing.


Content Recommendation for Attention Management in Unified Social Messaging

AAAI Conferences

With the growing popularity of social networks and collaboration systems, people are increasingly working with or socially connected with each other. Unified messaging system provides a single interface for users to receive and process information from multiple sources. It is highly desirable to design attention management solution that can help users easily navigate and process dozens of unread messages from a unified message system. Moreover, with the proliferation of mobile devices people are now selectively consuming the most important messages on the go between different activities in their daily life. The information overload problem is especially acute for mobile users with small screen to display. In this paper, we present \PAM, an intelligent end-to-end Personalized Attention Management solution that employs analytical techniques that can learn user interests and organize and prioritize incoming messages based on user interests. For a list of unread messages, \PAM generates a concise attention report that allows users to quickly scan the important new messages from his important social connections as well as messages about his most important tasks that the user is involved with. Our solution can also be applied in other applications such as news filtering and alerts on mobile devices. Our evaluation results demonstrate the effectiveness of \PAM.


Document Summarization Based on Data Reconstruction

AAAI Conferences

Document summarization is of great value to many real world applications, such as snippets generation for search results and news headlines generation. Traditionally, document summarization is implemented by extracting sentences that cover the main topics of a document with a minimum redundancy. In this paper, we take a different perspective from data reconstruction and propose a novel framework named Document Summarization based on Data Reconstruction (DSDR). Specifically, our approach generates a summary which consist of those sentences that can best reconstruct the original document. To model the relationship among sentences, we introduce two objective functions: (1) linear reconstruction, which approximates the document by linear combinations of the selected sentences; (2) nonnegative linear reconstruction, which allows only additive, not subtractive, linear combinations. In this framework, the reconstruction error becomes a natural criterion for measuring the quality of the summary. For each objective function, we develop an efficient algorithm to solve the corresponding optimization problem. Extensive experiments on summarization benchmark data sets DUC 2006 and DUC 2007 demonstrate the effectiveness of our proposed approach.


Choosing Linguistics over Vision to Describe Images

AAAI Conferences

In this paper, we address the problem of automatically generating human-like descriptions for unseen images, given a collection of images and their corresponding human-generated descriptions. Previous attempts for this task mostly rely on visual clues and corpus statistics, but do not take much advantage of the semantic information inherent in the available image descriptions. Here, we present a generic method which benefits from all these three sources (i.e. visual clues, corpus statistics and available descriptions) simultaneously, and is capable of constructing novel descriptions. Our approach works on syntactically and linguistically motivated phrases extracted from the human descriptions. Experimental evaluations demonstrate that our formulation mostly generates lucid and semantically correct descriptions, and significantly outperforms the previous methods on automatic evaluation metrics. One of the significant advantages of our approach is that we can generate multiple interesting descriptions for an image. Unlike any previous work, we also test the applicability of our method on a large dataset containing complex images with rich descriptions.


Table Header Detection and Classification

AAAI Conferences

In digital libraries, a table, as a specific document component as well as a condensed way to present structured and relational data, contains rich information and often the only source of .that information. In order to explore, retrieve, and reuse that data, tables should be identified and the data extracted. Table recognition is an old field of research. However, due to the diversity of table styles, the results are still far from satisfactory, and not a single algorithm performs well on all different types of tables. In this paper, we randomly take samples from the CiteSeerX to investigate diverse table styles for automatic table extraction. We find that table headers are one of the main characteristics of complex table styles. We identify a set of features that can be used to segregate headers from tabular data and build a classifier to detect table headers. Our empirical evaluation on PDF documents shows that using a Random Forest classifier achieves an accuracy of 92%.


MCTS Based on Simple Regret

AAAI Conferences

UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in games and Markov decision processes, is based on UCB, a sampling policy for the Multi-armed Bandit problem (MAB) that minimizes the cumulative regret. However, search differs from MAB in that in MCTS it is usually only the final ``arm pull'' (the actual move selection) that collects a reward, rather than all ``arm pulls''. Therefore, it makes more sense to minimize the simple regret, as opposed to the cumulative regret. We begin by introducing policies for multi-armed bandits with lower finite-time and asymptotic simple regret than UCB, using it to develop a two-stage scheme (SR+CR) for MCTS which outperforms UCT empirically. Optimizing the sampling process is itself a metareasoning problem, a solution of which can use value of information (VOI) techniques. Although the theory of VOI for search exists, applying it to MCTS is non-trivial, as typical myopic assumptions fail. Lacking a complete working VOI theory for MCTS, we nevertheless propose a sampling scheme that is ``aware'' of VOI, achieving an algorithm that in empirical evaluation outperforms both UCT and the other proposed algorithms.


Conflict-Based Search For Optimal Multi-Agent Path Finding

AAAI Conferences

In the multi agent path finding problem (MAPF) paths should be found for several agents, each with a different start and goal position such that agents do not collide. Previous optimal solvers applied global A*-based searches. We present a new search algorithm called Conflict Based Search (CBS). CBS is a two-level algorithm. At the high level, a search is performed on a tree based on conflicts between agents. At the low level, a search is performed only for a single agent at a time. In many cases this reformulation enables CBS to examine fewer states than A* while still maintaining optimality. We analyze CBS and show its benefits and drawbacks. Experimental results on various problems shows a speedup of up to a full order of magnitude over previous approaches.


Polynomially Decomposable Global Cost Functions in Weighted Constraint Satisfaction

AAAI Conferences

In maintaining consistencies, such as GAC*, FDGAC* and weak EDGAC*, for global cost functions, Weighted CSP (WCSP) solvers rely on the projection and extension operations, which entail the computation of the cost functions' minima.  Tractability of this minimum computation is essential for efficient execution. Since projections/extensions modify the cost functions, an important issue is tractable projection-safety , concerning whether minimum cost computation remains tractable after projections/extensions. In this paper, we prove that tractable projection-safety is always possible for projections/extensions to/from the nullary cost function ( W 0 ), and always impossible for projections/extensions to/from n -ary cost functions for n > = 2.  When n = 1, the answer is indefinite.  We give a simple negative example, while Lee and Leung's flow-based projection-safe cost functions are also tractable projection-safe. We propose polynomially decomposable cost functions, which are amenable to tractable minimum computation.  We further prove that the polynomial decomposability property is unaffected by projections/extensionsto/from unary cost functions.  Thus, polynomially decomposable cost functions are tractable projection-safe.  We show that the SOFT_AMONG, SOFT_REGULAR, SOFT_GRAMMAR and MAX_WEIGHT/MIN_WEIGHT are polynomially decomposable.  They are embedded in a WCSP solver for extensive experiments to confirm the feasibility and efficiency of our proposal.


Iterative Resource Allocation for Memory Intensive Parallel Search Algorithms on Clouds, Grids, and Shared Clusters

AAAI Conferences

The increasing availability of “utility computing” resources such as clouds, grids, and massively parallel shared clusters can provide practically unlimited processing and memory capacity on demand, at some cost per unit of resource usage. This requires a new perspective in the design and evaluation of parallel search algorithms. Previous work in parallel search implicitly assumed ownership of a cluster with a static amount of CPU cores and RAM, and emphasized wallclock runtime. With utility computing resources, trade-offs between performance and monetary costs must be considered. This paper considers dynamically increasing the usage of utility computing resources until a problem is solved. Efficient resource allocation policies are analyzed in comparison with an optimal allocation strategy. We evaluate our iterative allocation strategy by applying it to the HDA* parallel search algorithm. The experimental results validate our theoretical predictions. They show that, in practice, the costs incurred by iterative allocation are reasonably close to an optimal (but a priori unknown) policy, and are significantly better than the worst-case analytical bounds.


Partial-Expansion A* with Selective Node Generation

AAAI Conferences

A* is often described as being `optimal', in that it expands the minimum number of unique nodes. But, A* may generate many extra nodes which are never expanded. This is a performance loss, especially when the branching factor is large. Partial Expansion A* addresses this problem when expanding a node, n, by generating all the children of n but only storing children with the same f-cost as n. n is re-inserted into the OPEN list, but with the f-cost of the next best child. This paper introduces an enhanced version of PEA* (EPEA*). Given a priori domain knowledge, EPEA* generates only the children with the same f-cost as the parent. EPEA* is generalized to its iterative-deepening variant, EPE-IDA*. For some domains, these algorithms yield substantial performance improvements. State-of-the-art results were obtained for the pancake puzzle and for some multi-agent pathfinding instances. Drawbacks of EPEA* are also discussed.