Yang, Peng
Evolutionary Reinforcement Learning via Cooperative Coevolutionary Negatively Correlated Search
Zhang, Hu, Yang, Peng, Yu, Yanglong, Li, Mingjia, Tang, Ke
Evolutionary algorithms (EAs) have been successfully applied to optimize the policies for Reinforcement Learning (RL) tasks due to their exploration ability. The recently proposed Negatively Correlated Search (NCS) provides a distinct parallel exploration search behavior and is expected to facilitate RL more effectively. Considering that the commonly adopted neural policies usually involves millions of parameters to be optimized, the direct application of NCS to RL may face a great challenge of the large-scale search space. To address this issue, this paper presents an NCS-friendly Cooperative Coevolution (CC) framework to scale-up NCS while largely preserving its parallel exploration search behavior. The issue of traditional CC that can deteriorate NCS is also discussed. Empirical studies on 10 popular Atari games show that the proposed method can significantly outperform three state-of-the-art deep RL methods with 50% less computational time by effectively exploring a 1.7 million-dimensional search space.
Negatively Correlated Search as a Parallel Exploration Search Strategy
Yang, Peng, Tang, Ke, Yao, Xin
Parallel exploration is a key to a successful search. The recently proposed Negatively Correlated Search (NCS) achieved this ability by constructing a set of negatively correlated search processes and has been applied to many real-world problems. In NCS, the key technique is to explicitly model and maximize the diversity among search processes in parallel. However, the original diversity model was mostly devised by intuition, which introduced several drawbacks to NCS. In this paper, a mathematically principled diversity model is proposed to solve the existing drawbacks of NCS, resulting a new NCS framework. A new instantiation of NCS is also derived and its effectiveness is verified on a set of multi-modal continuous optimization problems.
Fast mmwave Beam Alignment via Correlated Bandit Learning
Wu, Wen, Cheng, Nan, Zhang, Ning, Yang, Peng, Zhuang, Weihua, Xuemin, null, Shen, null
Beam alignment (BA) is to ensure the transmitter and receiver beams are accurately aligned to establish a reliable communication link in millimeter-wave (mmwave) systems. Existing BA methods search the entire beam space to identify the optimal transmit-receive beam pair, which incurs significant BA latency on the order of seconds in the worst case. In this paper, we develop a learning algorithm to reduce BA latency, namely Hierarchical Beam Alignment (HBA) algorithm. We first formulate the BA problem as a stochastic multi-armed bandit problem with the objective to maximize the cumulative received signal strength within a certain period. The proposed algorithm takes advantage of the correlation structure among beams such that the information from nearby beams is extracted to identify the optimal beam, instead of searching the entire beam space. Furthermore, the prior knowledge on the channel fluctuation is incorporated in the proposed algorithm to further accelerate the BA process. Theoretical analysis indicates that the proposed algorithm is asymptotically optimal. Extensive simulation results demonstrate that the proposed algorithm can identify the optimal beam with a high probability and reduce the BA latency from hundreds of milliseconds to a few milliseconds in the multipath channel, as compared to the existing BA method in IEEE 802.11ad.
SupportNet: solving catastrophic forgetting in class incremental learning with support data
Li, Yu, Li, Zhongxiao, Ding, Lizhong, Yang, Peng, Hu, Yuhui, Chen, Wei, Gao, Xin
A plain well-trained deep learning model often does not have the ability to learn new knowledge without forgetting the previously learned knowledge, which is known as the catastrophic forgetting. Here we propose a novel method, SupportNet, to solve the catastrophic forgetting problem in class incremental learning scenario efficiently and effectively. SupportNet combines the strength of deep learning and support vector machine (SVM), where SVM is used to identify the support data from the old data, which are fed to the deep learning model together with the new data for further training so that the model can review the essential information of the old data when learning the new information. Two powerful consolidation regularizers are applied to ensure the robustness of the learned model. Comprehensive experiments on various tasks, including enzyme function prediction, subcellular structure classification and breast tumor classification, show that SupportNet drastically outperforms the state-of-the-art incremental learning methods and even reaches similar performance as the deep learning model trained from scratch on both old and new data. Our program is accessible at: https://github.com/lykaust15/SupportNet
Randomized Kernel Selection With Spectra of Multilevel Circulant Matrices
Ding, Lizhong (King Abdullah University of Science and Technology (KAUST)) | Liao, Shizhong (Tianjin University) | Liu, Yong (CAS, Beijing, Institute of Information Engineering) | Yang, Peng (King Abdullah University of Science and Technology (KAUST)) | Gao, Xin (King Abdullah University of Science and Technology (KAUST))
Kernel selection aims at choosing an appropriate kernel function for kernel-based learning algorithms to avoid either underfitting or overfitting of the resulting hypothesis. One of the main problems faced by kernel selection is the evaluation of the goodness of a kernel, which is typically difficult and computationally expensive. In this paper, we propose a randomized kernel selection approach to evaluate and select the kernel with the spectra of the specifically designed multilevel circulant matrices (MCMs), which is statistically sound and computationally efficient. Instead of constructing the kernel matrix, we construct the randomized MCM to encode the kernel function and all data points together with labels. We build a one-to-one correspondence between all candidate kernel functions and the spectra of the randomized MCMs by Fourier transform. We prove the statistical properties of the randomized MCMs and the randomized kernel selection criteria, which theoretically qualify the utility of the randomized criteria in kernel selection. With the spectra of the randomized MCMs, we derive a series of randomized criteria to conduct kernel selection, which can be computed in log-linear time and linear space complexity by fast Fourier transform (FFT). Experimental results demonstrate that our randomized kernel selection criteria are significantly more efficient than the existing classic and widely-used criteria while preserving similar predictive performance.
Robust Cost-Sensitive Learning for Recommendation with Implicit Feedback
Yang, Peng, Zhao, Peilin, Gao, Xin, Liu, Yong
Recommendation is the task of improving customer experience through personalized recommendation based on users' past feedback. In this paper, we investigate the most common scenario: the user-item (U-I) matrix of implicit feedback. Even though many recommendation approaches are designed based on implicit feedback, they attempt to project the U-I matrix into a low-rank latent space, which is a strict restriction that rarely holds in practice. In addition, although misclassification costs from imbalanced classes are significantly different, few methods take the cost of classification error into account. To address aforementioned issues, we propose a robust framework by decomposing the U-I matrix into two components: (1) a low-rank matrix that captures the common preference, and (2) a sparse matrix that detects the user-specific preference of individuals. A cost-sensitive learning model is embedded into the framework. Specifically, this model exploits different costs in the loss function for the observed and unobserved instances. We show that the resulting non-smooth convex objective can be optimized efficiently by an accelerated projected gradient method with closed-form solutions. Morever, the proposed algorithm can be scaled up to large-sized datasets after a relaxation. The theoretical result shows that even with a small fraction of 1's in the U-I matrix $M\in\mathbb{R}^{n\times m}$, the cost-sensitive error of the proposed model is upper bounded by $O(\frac{\alpha}{\sqrt{mn}})$, where $\alpha$ is a bias over imbalanced classes. Finally, empirical experiments are extensively carried out to evaluate the effectiveness of our proposed algorithm. Encouraging experimental results show that our algorithm outperforms several state-of-the-art algorithms on benchmark recommendation datasets.
Robust Online Multi-Task Learning with Correlative and Personalized Structures
Yang, Peng, Zhao, Peilin, Gao, Xin
Multi-Task Learning (MTL) can enhance a classifier's generalization performance by learning multiple related tasks simultaneously. Conventional MTL works under the offline or batch setting, and suffers from expensive training cost and poor scalability. To address such inefficiency issues, online learning techniques have been applied to solve MTL problems. However, most existing algorithms of online MTL constrain task relatedness into a presumed structure via a single weight matrix, which is a strict restriction that does not always hold in practice. In this paper, we propose a robust online MTL framework that overcomes this restriction by decomposing the weight matrix into two components: the first one captures the low-rank common structure among tasks via a nuclear norm and the second one identifies the personalized patterns of outlier tasks via a group lasso. Theoretical analysis shows the proposed algorithm can achieve a sub-linear regret with respect to the best linear model in hindsight. Even though the above framework achieves good performance, the nuclear norm that simply adds all nonzero singular values together may not be a good low-rank approximation. To improve the results, we use a log-determinant function as a non-convex rank approximation. The gradient scheme is applied to optimize log-determinant function and can obtain a closed-form solution for this refined problem. Experimental results on a number of real-world applications verify the efficacy of our method.
Incorporating Expert Knowledge into Keyphrase Extraction
Gollapalli, Sujatha Das (Institute for Infocomm Research, A*STAR) | Li, Xiao-li (Institute for Infocomm Research, A*STAR) | Yang, Peng (Tencent AI Lab)
Keyphrases that efficiently summarize a document’s content are used in various document processing and retrieval tasks. Current state-of-the-art techniques for keyphrase extraction operate at a phrase-level and involve scoring candidate phrases based on features of their component words.In this paper, we learn keyphrase taggers for research papers using token-based features incorporating linguistic, surface-form, and document-structure information through sequence labeling. We experimentally illustrate that using within document features alone, our tagger trained with ConditionalRandom Fields performs on-par with existing state-of-the-art systems that rely on information from Wikipedia and citation networks. In addition, we are also able to harness recent work on feature labeling to seamlessly incorporate expert knowledge and predictions from existing systems to enhance the extraction performance further. We highlight the modeling advantages of our keyphrase taggers and show significant performance improvements on two recently-compiled datasets of keyphrases from Computer Science research papers.
High-dimensional Black-box Optimization via Divide and Approximate Conquer
Yang, Peng, Tang, Ke, Yao, Xin
Divide and Conquer (DC) is conceptually well suited to high-dimensional optimization by decomposing a problem into multiple small-scale sub-problems. However, appealing performance can be seldom observed when the sub-problems are interdependent. This paper suggests that the major difficulty of tackling interdependent sub-problems lies in the precise evaluation of a partial solution (to a sub-problem), which can be overwhelmingly costly and thus makes sub-problems non-trivial to conquer. Thus, we propose an approximation approach, named Divide and Approximate Conquer (DAC), which reduces the cost of partial solution evaluation from exponential time to polynomial time. Meanwhile, the convergence to the global optimum (of the original problem) is still guaranteed. The effectiveness of DAC is demonstrated empirically on two sets of non-separable high-dimensional problems.