Peng, Yijie
Sample-Efficient Clustering and Conquer Procedures for Parallel Large-Scale Ranking and Selection
Zhang, Zishi, Peng, Yijie
We propose novel "clustering and conquer" procedures for the parallel large-scale ranking and selection (R&S) problem, which leverage correlation information for clustering to break the bottleneck of sample efficiency. In parallel computing environments, correlation-based clustering can achieve an $\mathcal{O}(p)$ sample complexity reduction rate, which is the optimal reduction rate theoretically attainable. Our proposed framework is versatile, allowing for seamless integration of various prevalent R&S methods under both fixed-budget and fixed-precision paradigms. It can achieve improvements without the necessity of highly accurate correlation estimation and precise clustering. In large-scale AI applications such as neural architecture search, a screening-free version of our procedure surprisingly surpasses fully-sequential benchmarks in terms of sample efficiency. This suggests that leveraging valuable structural information, such as correlation, is a viable path to bypassing the traditional need for screening via pairwise comparison--a step previously deemed essential for high sample efficiency but problematic for parallelization. Additionally, we propose a parallel few-shot clustering algorithm tailored for large-scale problems.
AlphaRank: An Artificial Intelligence Approach for Ranking and Selection Problems
Zhou, Ruihan, Hong, L. Jeff, Peng, Yijie
We introduce AlphaRank, an artificial intelligence approach to address the fixed-budget ranking and selection (R&S) problems. We formulate the sequential sampling decision as a Markov decision process and propose a Monte Carlo simulation-based rollout policy that utilizes classic R&S procedures as base policies for efficiently learning the value function of stochastic dynamic programming. We accelerate online sample-allocation by using deep reinforcement learning to pre-train a neural network model offline based on a given prior. We also propose a parallelizable computing framework for large-scale problems, effectively combining "divide and conquer" and "recursion" for enhanced scalability and efficiency. Numerical experiments demonstrate that the performance of AlphaRank is significantly improved over the base policies, which could be attributed to AlphaRank's superior capability on the trade-off among mean, variance, and induced correlation overlooked by many existing policies.
One Forward is Enough for Neural Network Training via Likelihood Ratio Method
Jiang, Jinyang, Zhang, Zeliang, Xu, Chenliang, Yu, Zhaofei, Peng, Yijie
While backpropagation (BP) is the mainstream approach for gradient computation in neural network training, its heavy reliance on the chain rule of differentiation constrains the designing flexibility of network architecture and training pipelines. We avoid the recursive computation in BP and develop a unified likelihood ratio (ULR) method for gradient estimation with just one forward propagation. Not only can ULR be extended to train a wide variety of neural network architectures, but the computation flow in BP can also be rearranged by ULR for better device adaptation. Moreover, we propose several variance reduction techniques to further accelerate the training process. Our experiments offer numerical results across diverse aspects, including various neural network training scenarios, computation flow rearrangement, and fine-tuning of pre-trained models. All findings demonstrate that ULR effectively enhances the flexibility of neural network training by permitting localized module training without compromising the global objective and significantly boosts the network robustness. Since backpropagation (BP) (Rumelhart et al., 1986) has greatly facilitated the success of artificial intelligence (AI) in various real-world scenarios (Song et al., 2021a;b; Sung et al., 2021), researchers are motivated to connect this gradient computation method in neural network training with human learning behavior (Scellier & Bengio, 2017; Lillicrap et al., 2020). However, there is no evidence that the learning mechanism in biological neurons relies on BP (Hinton, 2022). Pursuing alternatives to BP holds promise for not only advancing our understanding of learning mechanisms but also developing more robust and interpretable AI systems. Moreover, the significant computational cost associated with BP (Gomez et al., 2017; Zhu et al., 2022) also calls for innovations that simplify and expedite the training process without heavy consumption. There have been continuous efforts to substitute BP in neural network training.
Efficient Learning for Selecting Top-m Context-Dependent Designs
Zhang, Gongbo, Chen, Sihua, Huang, Kuihua, Peng, Yijie
We consider a simulation optimization problem for a context-dependent decision-making, which aims to determine the top-m designs for all contexts. Under a Bayesian framework, we formulate the optimal dynamic sampling decision as a stochastic dynamic programming problem, and develop a sequential sampling policy to efficiently learn the performance of each design under each context. The asymptotically optimal sampling ratios are derived to attain the optimal large deviations rate of the worst-case of probability of false selection. The proposed sampling policy is proved to be consistent and its asymptotic sampling ratios are asymptotically optimal. Numerical experiments demonstrate that the proposed method improves the efficiency for selection of top-m context-dependent designs.
A Novel Noise Injection-based Training Scheme for Better Model Robustness
Zhang, Zeliang, Jiang, Jinyang, Chen, Minjie, Wang, Zhiyuan, Peng, Yijie, Yu, Zhaofei
Noise injection-based method has been shown to be able to improve the robustness of artificial neural networks in previous work. In this work, we propose a novel noise injection-based training scheme for better model robustness. Specifically, we first develop a likelihood ratio method to estimate the gradient with respect to both synaptic weights and noise levels for stochastic gradient descent training. Then, we design an approximation for the vanilla noise injection-based training method to reduce memory and improve computational efficiency. Next, we apply our proposed scheme to spiking neural networks and evaluate the performance of classification accuracy and robustness on MNIST and Fashion-MNIST datasets. Experiment results show that our proposed method achieves a much better performance on adversarial robustness and slightly better performance on original accuracy, compared with the conventional gradient-based training method.
Quantile-Based Deep Reinforcement Learning using Two-Timescale Policy Gradient Algorithms
Jiang, Jinyang, Hu, Jiaqiao, Peng, Yijie
Classical reinforcement learning (RL) aims to optimize the expected cumulative reward. In this work, we consider the RL setting where the goal is to optimize the quantile of the cumulative reward. We parameterize the policy controlling actions by neural networks, and propose a novel policy gradient algorithm called Quantile-Based Policy Optimization (QPO) and its variant Quantile-Based Proximal Policy Optimization (QPPO) for solving deep RL problems with quantile objectives. QPO uses two coupled iterations running at different timescales for simultaneously updating quantiles and policy parameters, whereas QPPO is an off-policy version of QPO that allows multiple updates of parameters during one simulation episode, leading to improved algorithm efficiency. Our numerical results indicate that the proposed algorithms outperform the existing baseline algorithms under the quantile criterion.
An Efficient Dynamic Sampling Policy For Monte Carlo Tree Search
Zhang, Gongbo, Peng, Yijie, Xu, Yilong
Monte Carlo Tree Search (MCTS) is a popular tree-based search strategy within the framework of reinforcement learning (RL), which estimates the optimal value of a state and action by building a tree with Monte Carlo simulation. It has been widely used in sequential decision makings, including scheduling problems, inventory, production management, and real-world games, such as Go, Chess, Tic-tac-toe and Chinese Checkers. See Browne et al. (2012), Fu (2018) and Świechowski et al. (2021) for thorough overviews. MCTS uses little or no domain knowledge and self learns by running more simulations. Many variations have been proposed for MCTS to improve its performance. In particular, deep neural networks are combined into MCTS to achieve a remarkable success in the game of Go (Silver et al. 2016, 2017). A basic MCTS is to build a game tree from the root node in an incremental and asymmetric manner, where nodes correspond to states and edges correspond to possible state-action pairs.
Quantile-Based Policy Optimization for Reinforcement Learning
Jiang, Jinyang, Hu, Jiaqiao, Peng, Yijie
Classical reinforcement learning (RL) aims to optimize the expected cumulative rewards. In this work, we consider the RL setting where the goal is to optimize the quantile of the cumulative rewards. We parameterize the policy controlling actions by neural networks and propose a novel policy gradient algorithm called Quantile-Based Policy Optimization (QPO) and its variant Quantile-Based Proximal Policy Optimization (QPPO) to solve deep RL problems with quantile objectives. QPO uses two coupled iterations running at different time scales for simultaneously estimating quantiles and policy parameters and is shown to converge to the global optimal policy under certain conditions. Our numerical results demonstrate that the proposed algorithms outperform the existing baseline algorithms under the quantile criterion.
Training Artificial Neural Networks by Generalized Likelihood Ratio Method: Exploring Brain-like Learning to Improve Adversarial Defensiveness
Xiao, Li, Peng, Yijie, Hong, Jeff, Ke, Zewu
Recent work in deep learning has shown that the artificial neural networks are vulnerable to adversarial attacks, where a very small perturbation of the inputs can drastically alter the classification result. In this work, we propose a generalized likelihood ratio method capable of training the artificial neural networks with some biological brain-like mechanisms,.e.g., (a) learning by the loss value, (b) learning via neurons with discontinuous activation and loss functions. The traditional back propagation method cannot train the artificial neural networks with aforementioned brain-like learning mechanisms. Numerical results show that various artificial neural networks trained by the new method can significantly improve the defensiveness against the adversarial attacks. Code is available: \url{https://github.com/LX-doctorAI/GLR_ADV} .