Deng, Cheng
Learning A Structured Optimal Bipartite Graph for Co-Clustering
Nie, Feiping, Wang, Xiaoqian, Deng, Cheng, Huang, Heng
Co-clustering methods have been widely applied to document clustering and gene expression analysis. These methods make use of the duality between features and samples such that the co-occurring structure of sample and feature clusters can be extracted. In graph based co-clustering methods, a bipartite graph is constructed to depict the relation between features and samples. Most existing co-clustering methods conduct clustering on the graph achieved from the original data matrix, which doesnโt have explicit cluster structure, thus they require a post-processing step to obtain the clustering results. In this paper, we propose a novel co-clustering method to learn a bipartite graph with exactly k connected components, where k is the number of clusters. The new bipartite graph learned in our model approximates the original graph but maintains an explicit cluster structure, from which we can immediately get the clustering results without post-processing. Extensive empirical results are presented to verify the effectiveness and robustness of our model.
Boosting Complementary Hash Tables for Fast Nearest Neighbor Search
Liu, Xianglong (Beihang University) | Deng, Cheng ( Xidian University ) | Mu, Yadong (Peking University) | Li, Zhujin (Beihang University)
Hashing has been proven a promising technique for fast nearest neighbor search over massive databases. In many practical tasks it usually builds multiple hash tables for a desired level of recall performance. However, existing multi-table hashing methods suffer from the heavy table redundancy, without strong table complementarity and effective hash code learning. To address the problem, this paper proposes a multi-table learning method which pursues a specified number of complementary and informative hash tables from a perspective of ensemble learning. By regarding each hash table as a neighbor prediction model, the multi-table search procedure boils down to a linear assembly of predictions stemming from multiple tables. Therefore, a sequential updating and learning framework is naturally established in a boosting mechanism, theoretically guaranteeing the table complementarity and algorithmic convergence. Furthermore, each boosting round pursues the discriminative hash functions for each table by a discrete optimization in the binary code space. Extensive experiments carried out on two popular tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed boosted complementary hash-tables method enjoys the strong table complementarity and significantly outperforms the state-of-the-arts.
Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval
Yang, Erkun (Xidian University) | Deng, Cheng (Xidian University) | Liu, Wei (Tencent) | Liu, Xianglong (Beihang University) | Tao, Dacheng (University of Technology Sydney) | Gao, Xinbo (Xidian University)
With benefits of low storage cost and fast query speed, cross-modal hashing has received considerable attention recently. However, almost all existing methods on cross-modal hashing cannot obtain powerful hash codes due to directly utilizing hand-crafted features or ignoring heterogeneous correlations across different modalities, which will greatly degrade the retrieval performance. In this paper, we propose a novel deep cross-modal hashing method to generate compact hash codes through an end-to-end deep learning architecture, which can effectively capture the intrinsic relationships between various modalities. Our architecture integrates different types of pairwise constraints to encourage the similarities of the hash codes from an intra-modal view and an inter-modal view, respectively. Moreover, additional decorrelation constraints are introduced to this architecture, thus enhancing the discriminative ability of each hash bit. Extensive experiments show that our proposed method yields state-of-the-art results on two cross-modal retrieval datasets.
New l1-Norm Relaxations and Optimizations for Graph Clustering
Nie, Feiping (University of Texas at Arlington) | Wang, Hua (Colorado School of Mines) | Deng, Cheng (Xidian University) | Gao, Xinbo (Xidian University) | Li, Xuelong (Chinese Academy of Sciences) | Huang, Heng (University of Texas at Arlington)
In recent data mining research, the graph clustering methods, such as normalized cut and ratio cut, have been well studied and applied to solve many unsupervised learning applications. The original graph clustering methods are NP-hard problems. Traditional approaches used spectral relaxation to solve the graph clustering problems. The main disadvantage of these approaches is that the obtained spectral solutions could severely deviate from the true solution. To solve this problem, in this paper, we propose a new relaxation mechanism for graph clustering methods. Instead of minimizing the squared distances of clustering results, we use the l1-norm distance. More important, considering the normalized consistency, we also use the l1-norm for the normalized terms in the new graph clustering relaxations. Due to the sparse result from the l1-norm minimization, the solutions of our new relaxed graph clustering methods get discrete values with many zeros, which are close to the ideal solutions. Our new objectives are difficult to be optimized, because the minimization problem involves the ratio of nonsmooth terms. The existing sparse learning optimization algorithms cannot be applied to solve this problem. In this paper, we propose a new optimization algorithm to solve this difficult non-smooth ratio minimization problem. The extensive experiments have been performed on three two-way clustering and eight multi-way clustering benchmark data sets. All empirical results show that our new relaxation methods consistently enhance the normalized cut and ratio cut clustering results.
Drosophila Gene Expression Pattern Annotations via Multi-Instance Biological Relevance Learning
Wang, Hua (Colorado School of Mines) | Deng, Cheng (Xidian University) | Zhang, Hao (Colorado School of Mines) | Gao, Xinbo (Xidian University) | Huang, Heng (University of Texas at Arlington)
Recent developments in biologyhave produced a large number of gene expression patterns, many of which have been annotated textually with anatomical and developmental terms. These terms spatially correspond to local regions of the images, which are attached collectively to groups of images. Because one does not know which term is assigned to which region of which image in the group, the developmental stage classification and anatomical term annotation turn out to be a multi-instance learning (MIL) problem, which considers input as bags of instances and labels are assigned to the bags. Most existing MIL methods routinely use the Bag-to-Bag (B2B) distances, which, however, are often computationally expensive and may not truly reflect the similarities between the anatomical and developmental terms. In this paper, we approach the MIL problem from a new perspective using the Class-to-Bag (C2B) distances, which directly assesses the relations between annotation terms and image panels. Taking into account the two challenging properties of multi-instance gene expression data, high heterogeneity and weak label association, we computes the C2B distance by introducing class specific distance metrics and locally adaptive significance coefficients.We apply our new approach to automatic gene expression pattern classification and annotation on the Drosophila melanogaster species. Extensive experiments have demonstrated the effectiveness of our new method.
Optimizing Bag Features for Multiple-Instance Retrieval
Fu, Zhouyu (University of Western Sydney, Kingswood) | Pan, Feifei (New York Institute of Technology) | Deng, Cheng (Xidian University) | Liu, Wei (IBM T. J. Watson Research Center)
Multiple-Instance (MI) learning is an important supervised learning technique which deals with collections of instances called bags. While existing research in MI learning mainly focused on classification, in this paper we propose a new approach for MI retrieval to enable effective similarity retrieval of bags of instances, where training data is presented in the form of similar and dissimilar bag pairs. An embedded scheme is devised as encoding each bag into a single bag feature vector by exploiting a similarity-based transformation. In this way, the original MI problem is converted into a single-instance version. Furthermore, we develop a principled approach for optimizing bag features specific to similarity retrieval through leveraging pairwise label information at the bag level. The experimental results demonstrate the effectiveness of the proposed approach in comparison with the alternatives for MI retrieval.