South China University of Technology
Decentralized Gradient-Quantization Based Matrix Factorization for Fast Privacy-Preserving Point-of-Interest Recommendation
Zhou, Xuebin (South China University of Technology) | Hu, Zhibin (South China Normal University) | Huang, Jin (South China Normal University) | Chen, Jian (South China University of Technology)
With the rapidly growing of location-based social networks, point-of-interest (POI) recommendation has been attracting tremendous attentions. Previous works for POI recommendation usually use matrix factorization (MF)-based methods, which achieve promising performance. However, existing MF-based methods suffer from two critical limitations: (1) Privacy issues: all users’ sensitive data are collected to the centralized server which may leak on either the server side or during transmission. (2) Poor resource utilization and training efficiency: training on centralized server with potentially huge low-rank matrices is computational inefficient. In this paper, we propose a novel decentralized gradient-quantization based matrix factorization (DGMF) framework to address the above limitations in POI recommendation. Compared with the centralized MF methods which store all sensitive data and low-rank matrices during model training, DGMF treats each user’s device (e.g., phone) as an independent learner and keeps the sensitive data on each user’s end. Furthermore, a privacy-preserving and communication-efficient mechanism with gradient-quantization technique is presented to train the proposed model, which aims to handle the privacy problem and reduces the communication cost in the decentralized setting. Theoretical guarantees of the proposed algorithm and experimental studies on real-world datasets demonstrate the effectiveness of the proposed algorithm.
Double Forward Propagation for Memorized Batch Normalization
Guo, Yong (South China University of Technology) | Wu, Qingyao (South China University of Technology) | Deng, Chaorui (South China University of Technology) | Chen, Jian (South China University of Technology) | Tan, Mingkui (South China University of Technology)
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs). Although the standard BN can significantly accelerate the training of DNNs and improve the generalization performance, it has several underlying limitations which may hamper the performance in both training and inference. In the training stage, BN relies on estimating the mean and variance of data using a single mini-batch. Consequently, BN can be unstable when the batch size is very small or the data is poorly sampled. In the inference stage, BN often uses the so called moving mean and moving variance instead of batch statistics, i.e., the training and inference rules in BN are not consistent. Regarding these issues, we propose a memorized batch normalization (MBN), which considers multiple recent batches to obtain more accurate and robust statistics. Note that after the SGD update for each batch, the model parameters will change, and the features will change accordingly, leading to the Distribution Shift before and after the update for the considered batch. To alleviate this issue, we present a simple Double-Forward scheme in MBN which can further improve the performance. Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference. Empirical results show that the MBN based models trained with the Double-Forward scheme greatly reduce the sensitivity of data and significantly improve the generalization performance.
Feature Enhancement Network: A Refined Scene Text Detector
Zhang, Sheng (South China University of Technology) | Liu, Yuliang (South China University of Technology) | Jin, Lianwen (South China University of Technology) | Luo, Canjie (South China University of Technology)
In this paper, we propose a refined scene text detector with a novel Feature Enhancement Network (FEN)for Region Proposal and Text Detection Refinement. Retrospectively, both region proposal with only 3 x 3 sliding-window feature and text detection refinement with single scale high level feature are insufficient, especially for smaller scene text. Therefore, we design a new FEN network with task-specific, low and high level semantic features fusion to improve the performance of text detection. Besides, since unitary position-sensitive RoI pooling in general object detection is unreasonable for variable text regions, an adaptively weighted position-sensitive RoI pooling layer is devised for further enhancing the detecting accuracy. To tackle the sample-imbalance problem during the refinement stage,we also propose an effective positives mining strategy for efficiently training our network. Experiments on ICDAR2011 and 2013 robust text detection benchmarks demonstrate that our method can achieve state-of-the-art results, outperforming all reported methods in terms of F-measure.
Supervised Deep Hashing for Hierarchical Labeled Data
Wang, Dan (Beijing Institute of Technology) | Huang, Heyan (Beijing Institute of Technology) | Lu, Chi (Beijing Institute of Technology) | Feng, Bo-Si (Beijing Institute of Technology) | Wen, Guihua (South China University of Technology) | Nie, Liqiang (Shandong University) | Mao, Xian-Ling (Beijing Institute of Technology)
Recently, hashing methods have been widely used in large-scale image retrieval. However, most existing supervised hashing methods do not consider the hierarchical relation of labels,which means that they ignored the rich semantic information stored in the hierarchy. Moreover, most of previous works treat each bit in a hash code equally, which does not meet the scenario of hierarchical labeled data. To tackle the aforementioned problems, in this paper, we propose a novel deep hashing method, called supervised hierarchical deep hashing (SHDH), to perform hash code learning for hierarchical labeled data. Specifically, we define a novel similarity formula for hierarchical labeled data by weighting each level, and design a deep neural network to obtain a hash code for each data point. Extensive experiments on two real-world public datasets show that the proposed method outperforms the state-of-the-art baselines in the image retrieval task.
Selecting Proper Multi-Class SVM Training Methods
Chen, Yawen (South China University of Technology) | Wen, Zeyi (National University of Singapore) | Chen, Jian (South China University of Technology) | Huang, Jin (South China Normal University)
Support Vector Machines (SVMs) are excellent candidate solutions to solving multi-class problems, and multi-class SVMs can be trained by several different methods. Different training methods commonly produce SVMs with different effectiveness, and no multi-class SVM training method always outperforms other multi-class SVM training methods on all problems. This raises difficulty for practitioners to choose the best training method for a given problem. In this work, we propose a Multi-class Method Selection (MMS) approach to help users select the most appropriate method among one-versus-one (OVO), one-versus-all (OVA) and structural SVMs (SSVMs) for a given problem. Our key idea is to select the training method based on the distribution of training data and the similarity between different classes. Using the distribution and class similarity, we estimate the unclassifiable rate of each multi-class SVM training method, and select the training method with the minimum unclassifiable rate. Our initial findings show: (i) SSVMs with linear kernel perform worse than OVO and OVA; (ii) MMS often produces SVM classifiers that can confidently classify unseen instances.
Unified Locally Linear Classifiers With Diversity-Promoting Anchor Points
Liu, Chenghao (Zhejiang University, China) | Zhang, Teng (Singapore Management University, Singapore) | Zhao, Peilin (Zhejiang University) | Sun, Jianling (Alibaba-Zhejiang University Joint Institute of Frontier Technologies) | Hoi, Steven C. H. (South China University of Technology)
Locally Linear Support Vector Machine (LLSVM) has been actively used in classification tasks due to its capability of classifying nonlinear patterns. However, existing LLSVM suffers from two drawbacks: (1) a particular and appropriate regularization for LLSVM has not yet been addressed; (2) it usually adopts a three-stage learning scheme composed of learning anchor points by clustering, learning local coding coordinates by a predefined coding scheme, and finally learning for training classifiers. We argue that this decoupled approaches oversimplifies the original optimization problem, resulting in a large deviation due to the disparate purpose of each step. To address the first issue, we propose a novel diversified regularization which could capture infrequent patterns and reduce the model size without sacrificing the representation power. Based on this regularization, we develop a joint optimization algorithm among anchor points, local coding coordinates and classifiers to simultaneously minimize the overall classification risk, which is termed as Diversified and Unified Locally Linear Support Vector Machine (DU-LLSVM for short). To the best of our knowledge, DU-LLSVM is the first principled method that directly learns sparse local coding and can be easily generalized to other supervised learning models. Extensive experiments showed that DU-LLSVM consistently surpassed several state-of-the-art methods with a predefined local coding scheme (e.g. LLSVM) or a supervised anchor point learning (e.g. SAPL-LLSVM).
T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition
Liu, Kun (Beijing University of Posts and Telecommunications) | Liu, Wu (Beijing University of Posts and Telecommunications) | Gan, Chuang (Tsinghua University) | Tan, Mingkui (South China University of Technology) | Ma, Huadong (Beijing University of Posts and Telecommunications)
Video-based action recognition with deep neural networks has shown remarkable progress. However, most of the existing approaches are too computationally expensive due to the complex network architecture. To address these problems, we propose a new real-time action recognition architecture, called Temporal Convolutional 3D Network (T-C3D), which learns video action representations in a hierarchical multi-granularity manner. Specifically, we combine a residual 3D convolutional neural network which captures complementary information on the appearance of a single frame and the motion between consecutive frames with a new temporal encoding method to explore the temporal dynamics of the whole video. Thus heavy calculations are avoided when doing the inference, which enables the method to be capable of real-time processing. On two challenging benchmark datasets, UCF101 and HMDB51, our method is significantly better than state-of-the-art real-time methods by over 5.4% in terms of accuracy and 2 times faster in terms of inference speed (969 frames per second), demonstrating comparable recognition performance to the state-of-the-art methods. The source code for the complete system as well as the pre-trained models are publicly available at https://github.com/tc3d.
Efficient Support Vector Machine Training Algorithm on GPUs
Shi, Jiashuai (South China University of Technology) | Wen, Zeyi (National University of Singapore) | He, Bingsheng (National University of Singapore) | Chen, Jian (South China University of Technology)
Support Vector Machines (SVMs) are popular for many machine learning tasks. With rapid growth of dataset size, the high cost of training limits the wide use of SVMs. Several SVM implementations on GPUs have been proposed to accelerate SVMs. However, they support only classification (SVC) or regression (SVR). In this work, we propose a simple and effective SVM training algorithm on GPUs which can be used for SVC, SVR and one-class SVM. Initial experiments show that our implementation outperforms existing ones. We are in the process of encapsulating our algorithm into an easy-to-use library which has Python, R and MATLAB interfaces.
Improving Efficiency of SVM k -Fold Cross-Validation by Alpha Seeding
Wen, Zeyi (The University of Melbourne) | Li, Bin (South China University of Technology) | Kotagiri, Ramamohanarao (The University of Melbourne) | Chen, Jian (South China University of Technology) | Chen, Yawen (South China University of Technology) | Zhang, Rui (The University of Melbourne)
The k-fold cross-validation is commonly used to evaluate the effectiveness of SVMs with the selected hyper-parameters. It is known that the SVM k-fold cross-validation is expensive, since it requires training k SVMs. However, little work has explored reusing the h-th SVM for training the (h+1)-th SVM for improving the efficiency of k-fold cross-validation. In this paper, we propose three algorithms that reuse the h-th SVM for improving the efficiency of training the (h+1)-th SVM. Our key idea is to efficiently identify the support vectors and to accurately estimate their associated weights (also called alpha values) of the next SVM by using the previous SVM. Our experimental results show that our algorithms are several times faster than the k-fold cross-validation which does not make use of the previously trained SVM. Moreover, our algorithms produce the same results (hence same accuracy) as the k-fold cross-validation which does not make use of the previously trained SVM.
S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions
Mao, Xian-Ling (Beijing Institute of Technology) | Feng, Bo-Si (Beijing Institute of Technology) | Hao, Yi-Jing (Beijing Institute of Technology) | Nie, Liqiang (National University of Singapore) | Huang, Heyan (Beijing Institute of Technology) | Wen, Guihua (South China University of Technology)
To compare the similarity of probability distributions, the information-theoretically motivated metrics like Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JSD) are often more reasonable compared with metrics for vectors like Euclidean and angular distance. However, existing locality-sensitive hashing (LSH) algorithms cannot support the information-theoretically motivated metrics for probability distributions. In this paper, we first introduce a new approximation formula for S2JSD-distance, and then propose a novel LSH scheme adapted to S2JSD-distance for approximate nearest neighbors search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve.