University of Technology Sydney
Boosting for Real-Time Multivariate Time Series Classification
Wang, Haishuai (University of Technology Sydney) | Wu, Jun (Beijing Jiaotong University)
Multivariate time series (MTS) is useful for detecting abnormity cases in healthcare area. In this paper, we propose an ensemble boosting algorithm to classify abnormality surgery time series based on learning shapelet features. Specifically, we first learn shapelets by logistic regression from multivariate time series. Based on the learnt shapelets, we propose a MTS ensemble boosting approach when the time series arrives as stream fashion. Experimental results on a real-world medical dataset demonstrate the effectiveness of the proposed methods.
Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval
Yang, Erkun (Xidian University) | Deng, Cheng (Xidian University) | Liu, Wei (Tencent) | Liu, Xianglong (Beihang University) | Tao, Dacheng (University of Technology Sydney) | Gao, Xinbo (Xidian University)
With benefits of low storage cost and fast query speed, cross-modal hashing has received considerable attention recently. However, almost all existing methods on cross-modal hashing cannot obtain powerful hash codes due to directly utilizing hand-crafted features or ignoring heterogeneous correlations across different modalities, which will greatly degrade the retrieval performance. In this paper, we propose a novel deep cross-modal hashing method to generate compact hash codes through an end-to-end deep learning architecture, which can effectively capture the intrinsic relationships between various modalities. Our architecture integrates different types of pairwise constraints to encourage the similarities of the hash codes from an intra-modal view and an inter-modal view, respectively. Moreover, additional decorrelation constraints are introduced to this architecture, thus enhancing the discriminative ability of each hash bit. Extensive experiments show that our proposed method yields state-of-the-art results on two cross-modal retrieval datasets.
Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling
Luo, Minnan (Xi'an Jiaotong University) | Nie, Feiping (Northwestern Polytechnical University) | Chang, Xiaojun (University of Technology Sydney) | Yang, Yi (University of Technology Sydney) | Hauptmann, Alexander (Carnegie Mellon University) | Zheng, Qinghua (Xi'an Jiaotong University)
Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the documentโs topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using l21-norm and capped l21-norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.
Compressed K-Means for Large-Scale Clustering
Shen, Xiaobo (Nanjing University of Science and Technology) | Liu, Weiwei (University of Technology Sydney) | Tsang, Ivor (University of Technology Sydney) | Shen, Fumin (University of Electronic Science and Technology of China) | Sun, Quan-Sen (Nanjing University of Science and Technology)
Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.
Approximate Conditional Gradient Descent on Multi-Class Classification
Liu, Zhuanghua (University of Technology Sydney) | Tsang, Ivor (University of Technology Sydney)
Conditional gradient descent, aka the Frank-Wolfe algorithm,regains popularity in recent years. The key advantage of Frank-Wolfe is that at each step the expensive projection is replaced with a much more efficient linear optimization step. Similar to gradient descent, the loss function of Frank-Wolfe scales with the data size. Training on big data poses a challenge for researchers. Recently, stochastic Frank-Wolfe methods have been proposed to solve the problem, but they do not perform well in practice. In this work, we study the problem of approximating the Frank-Wolfe algorithm on the large-scale multi-class classification problem which is a typical application of the Frank-Wolfe algorithm. We present a simple but effective method employing internal structure of data to approximate Frank-Wolfe on the large-scale multiclass classification problem. Empirical results verify that our method outperforms the state-of-the-art stochastic projection free methods.
Learning Sparse Confidence-Weighted Classifier on Very High Dimensional Data
Tan, Mingkui (University of Adelaide) | Yan, Yan (University of Technology Sydney) | Wang, Li (University of Illinois at Chicago) | Hengel, Anton Van Den (University of Adelaide) | Tsang, Ivor W. (University of Technology Sydney) | Shi, Qinfeng (Javen) (University of Adelaide)
Confidence-weighted (CW) learning is a successful online learning paradigm which maintains a Gaussian distribution over classifier weights and adopts a covariancematrix to represent the uncertainties of the weight vectors. However, there are two deficiencies in existing full CW learning paradigms, these being the sensitivity to irrelevant features, and the poor scalability to high dimensional data due to the maintenance of the covariance structure. In this paper, we begin by presenting an online-batch CW learning scheme, and then present a novel paradigm to learn sparse CW classifiers. The proposed paradigm essentially identifies feature groups and naturally builds a block diagonal covariance structure, making it very suitable for CW learning over very high-dimensional data.Extensive experimental results demonstrate the superior performance of the proposed methods over state-of-the-art counterparts on classification and feature selection tasks.
Teaching-to-Learn and Learning-to-Teach for Multi-label Propagation
Gong, Chen (Shanghai Jiao Tong University and University of Technology Sydney) | Tao, Dacheng (University of Technology Sydney) | Yang, Jie (Shanghai Jiao Tong University) | Liu, Wei (Didi Research, Beijing, China)
Multi-label propagation aims to transmit the multi-label information from labeled examples to unlabeled examples based on a weighted graph. Existing methods ignore the specific propagation difficulty of different unlabeled examples and conduct the propagationin an imperfect sequence, leading to the error-prone classification of some difficult examples with uncertain labels. To address this problem, this paper associates each possible label with a "teacher", and proposesa "Multi-Label Teaching-to-Learn and Learning-to-Teach" (ML-TLLT) algorithm, so that the entire propagationprocess is guided by the teachers and manipulated from simple examples to more difficult ones. In the teaching-to-learn step, the teachers select the simplest examples for the current propagation by investigating both the definitiveness of each possible label of the unlabeled examples, and the dependencies between labels revealed by the labeled examples. In the learning-to-teach step, the teachers reversely learn from the learnerโs feedback to properly select the simplest examples for the next propagation. Thorough empirical studies show that due to the optimized propagation sequence designed by the teachers, ML-TLLT yields generally better performance than seven state-of-the-art methods on the typical multi-label benchmark datasets.
Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition
Gan, Chuang (Tsinghua University) | Lin, Ming (University of Michigan, Ann Arbor) | Yang, Yi (University of Technology Sydney) | Melo, Gerard de (Tsinghua University) | Hauptmann, Alexander G. (Carnegie Mellon University)
Vast quantities of videos are now being captured at astonishing rates, but the majority of these are not labelled. To cope with such data, we consider the task of content-based activity recognition in videos without any manually labelled examples, also known as zero-shot video recognition. To achieve this, videos are represented in terms of detected visual concepts, which are then scored as relevant or irrelevant according to their similarity with a given textual query. In this paper, we propose a more robust approach for scoring concepts in order to alleviate many of the brittleness and low precision problems of previous work. Not only do we jointly consider semantic relatedness, visual reliability, and discriminative power. To handle noise and non-linearities in the ranking scores of the selected concepts, we propose a novel pairwise order matrix approach for score aggregation. Extensive experiments on the large-scale TRECVID Multimedia Event Detection data show the superiority of our approach.
Random Mixed Field Model for Mixed-Attribute Data Restoration
Li, Qiang (University of Technology Sydney) | Bian, Wei (University of Technology Sydney) | Xu, Richard Yi Da (University of Technology Sydney) | You, Jane (The Hong Kong Polytechnic University) | Tao, Dacheng (University of Technology Sydney)
Noisy and incomplete data restoration is a critical preprocessing step in developing effective learning algorithms, which targets to reduce the effect of noise and missing values in data. By utilizing attribute correlations and/or instance similarities, various techniques have been developed for data denoising and imputation tasks. However, current existing data restoration methods are either specifically designed for a particular task, or incapable of dealing with mixed-attribute data. In this paper, we develop a new probabilistic model to provide a general and principled method for restoring mixed-attribute data. The main contributions of this study are twofold: a) a unified generative model, utilizing a generic random mixed field (RMF) prior, is designed to exploit mixed-attribute correlations; and b) a structured mean-field variational approach is proposed to solve the challenging inference problem of simultaneous denoising and imputation. We evaluate our method by classification experiments on both synthetic data and real benchmark datasets. Experiments demonstrate, our approach can effectively improve the classification accuracy of noisy and incomplete data by comparing with other data restoration methods.
Shakeout: A New Regularized Deep Neural Network Training Scheme
Kang, Guoliang (University of Technology Sydney) | Li, Jun (University of Technology Sydney) | Tao, Dacheng (University of Technology Sydney)
Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. The invention of effective training techniques largely contributes to this success. The so-called "Dropout" training scheme is one of the most powerful tool to reduce over-fitting. From the statistic point of view, Dropout works by implicitly imposing an L2 regularizer on the weights. In this paper, we present a new training scheme: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, our method randomly chooses to enhance or inverse the contributions of each unit to the next layer. We show that our scheme leads to a combination of L1 regularization and L2 regularization imposed on the weights, which has been proved effective by the Elastic Net models in practice.We have empirically evaluated the Shakeout scheme and demonstrated that sparse network weights are obtained via Shakeout training. Our classification experiments on real-life image datasets MNIST and CIFAR-10 show that Shakeout deals with over-fitting effectively.