Country
Preventing Imitation Learning with Adversarial Policy Ensembles
Zhan, Albert, Tiomkin, Stas, Abbeel, Pieter
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy. Policies, such as human, or policies on deployed robots, can all be cloned without consent from the owners. How can we protect against external observers cloning our proprietary policies? To answer this question we introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies, whose demonstrations are guaranteed to be useless for an external observer. We formulate this idea by a constrained optimization problem, where the objective is to improve proprietary policies, and at the same time deteriorate the virtual policy of an eventual external observer. We design a tractable algorithm to solve this new optimization problem by modifying the standard policy gradient algorithm. Our formulation can be interpreted in lenses of confidentiality and adversarial behaviour, which enables a broader perspective of this work. We demonstrate the existence of "non-clonable" ensembles, providing a solution to the above optimization problem, which is calculated by our modified policy gradient algorithm. To our knowledge, this is the first work regarding the protection of policies in Reinforcement Learning.
Quaternion-Valued Recurrent Projection Neural Networks on Unit Quaternions
Valle, Marcos Eduardo, Lobo, Rodolfo Anibal
Hypercomplex-valued neural networks, including quaternion-valued neural networks, can treat multidimensional data as a single entity. In this paper, we present the quaternion-valued recurrent projection neural networks (QRPNNs). Briefly, QRPNNs are obtained by combining the non-local projection learning with the quaternion-valued recurrent correlation neural network (QRCNNs). We show that QRPNNs overcome the crosstalk problem of QRCNNs. Thus, they are appropriate to implement associative memories. Furthermore, computational experiments reveal that QRPNNs exhibit greater storage capacity and noise tolerance than their corresponding QRCNNs. Introduction The Hopfield neural network, developed in the early 1980s, is an important and widely-known recurrent neural network which can be used to implement associative memories [1, 2]. Successful applications of the Hopfield network include control [3, 4], computer vision and image processing [5, 6], classification [7, 8], and optimization [2, 9, 10]. Despite its many successful applications, the Hopfield network may suffer from a very low storage capacity when used to implement associative memories. Precisely, due to crosstalk between the stored items, the Hebbian learning adopted by Hopfield in his original work allows for the storage of approximately n/(2 ln n) items, where n denotes the length of the stored vectors [11]. For example, Personnaz et al. [12] as well as Kanter and Sompolinsky [13] proposed the projection rule to determine the synaptic weights of the Hopfield networks. The projection rule increases the storage capacity of the Hopfield network to n 1 items. Another simple but effective improvement on the storage capacity of the original Hopfield networks was achieved by Chiueh and Goodman's recurrent correlation neural networks (RCNNs) [14, 15]. Briefly, an RCNN is obtained by decomposing the Hopfield network with Hebbian learning into a two layer recurrent neural network.
Semantic Discord: Finding Unusual Local Patterns for Time Series
Zhang, Li, Gao, Yifeng, Lin, Jessica
Finding anomalous subsequence in a long time series is a very important but difficult problem. Existing state-of-the-art methods have been focusing on searching for the subsequence that is the most dissimilar to the rest of the subsequences; however, they do not take into account the background patterns that contain the anomalous candidates. As a result, such approaches are likely to miss local anomalies. We introduce a new definition named \textit{semantic discord}, which incorporates the context information from larger subsequences containing the anomaly candidates. We propose an efficient algorithm with a derived lower bound that is up to 3 orders of magnitude faster than the brute force algorithm in real world data. We demonstrate that our method significantly outperforms the state-of-the-art methods in locating anomalies by extensive experiments. We further explain the interpretability of semantic discord.
CosmoVAE: Variational Autoencoder for CMB Image Inpainting
Yi, Kai, Guo, Yi, Fan, Yanan, Hamann, Jan, Wang, Yu Guang
Cosmic microwave background radiation (CMB) is critical to the understanding of the early universe and precise estimation of cosmological constants. Due to the contamination of thermal dust noise in the galaxy, the CMB map that is an image on the two-dimensional sphere has missing observations, mainly concentrated on the equatorial region. The noise of the CMB map has a significant impact on the estimation precision for cosmological parameters. Inpainting the CMB map can effectively reduce the uncertainty of parametric estimation. In this paper, we propose a deep learning-based variational autoencoder --- CosmoVAE, to restoring the missing observations of the CMB map. The input and output of CosmoVAE are square images. To generate training, validation, and test data sets, we segment the full-sky CMB map into many small images by Cartesian projection. CosmoVAE assigns physical quantities to the parameters of the VAE network by using the angular power spectrum of the Gaussian random field as latent variables. CosmoVAE adopts a new loss function to improve the learning performance of the model, which consists of $\ell_1$ reconstruction loss, Kullback-Leibler divergence between the posterior distribution of encoder network and the prior distribution of latent variables, perceptual loss, and total-variation regularizer. The proposed model achieves state of the art performance for Planck \texttt{Commander} 2018 CMB map inpainting.
Learning the Hypotheses Space from data Part II: Convergence and Feasibility
Marcondes, Diego, Simonis, Adilson, Barrera, Junior
In part \textit{I} we proposed a structure for a general Hypotheses Space $\mathcal{H}$, the Learning Space $\mathbb{L}(\mathcal{H})$, which can be employed to avoid \textit{overfitting} when estimating in a complex space with relative shortage of examples. Also, we presented the U-curve property, which can be taken advantage of in order to select a Hypotheses Space without exhaustively searching $\mathbb{L}(\mathcal{H})$. In this paper, we carry further our agenda, by showing the consistency of a model selection framework based on Learning Spaces, in which one selects from data the Hypotheses Space on which to learn. The method developed in this paper adds to the state-of-the-art in model selection, by extending Vapnik-Chervonenkis Theory to \textit{random} Hypotheses Spaces, i.e., Hypotheses Spaces learned from data. In this framework, one estimates a random subspace $\hat{\mathcal{M}} \in \mathbb{L}(\mathcal{H})$ which converges with probability one to a target Hypotheses Space $\mathcal{M}^{\star} \in \mathbb{L}(\mathcal{H})$ with desired properties. As the convergence implies asymptotic unbiased estimators, we have a consistent framework for model selection, showing that it is feasible to learn the Hypotheses Space from data. Furthermore, we show that the generalization errors of learning on $\hat{\mathcal{M}}$ are lesser than those we commit when learning on $\mathcal{H}$, so it is more efficient to learn on a subspace learned from data.
Analytic Study of Double Descent in Binary Classification: The Impact of Loss
Kini, Ganesh, Thrampoulidis, Christos
Extensive empirical evidence reveals that, for a wide range of different learning methods and datasets, the risk curve exhibits a double-descent (DD) trend as a function of the model size. In a recent paper [Zeyu,Kammoun,Thrampoulidis,2019] the authors studied binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD. In this paper, we complement these results by extending them to GD with square loss. We show that the DD phenomenon persists, but we also identify several differences compared to logistic loss. This emphasizes that crucial features of DD curves (such as their transition threshold and global minima) depend both on the training data and on the learning algorithm. We further study the dependence of DD curves on the size of the training set. Similar to our earlier work, our results are analytic: we plot the DD curves by first deriving sharp asymptotics for the test error under Gaussian features. Albeit simple, the models permit a principled study of DD features, the outcomes of which theoretically corroborate related empirical findings occurring in more complex learning tasks.
Faster Projection-free Online Learning
In many online learning problems the computational bottleneck for gradient-based methods is the projection operation. For this reason, in many problems the most efficient algorithms are based on the Frank-Wolfe method, which replaces projections by linear optimization. In the general case, however, online projection-free methods require more iterations than projection-based methods: the best known regret bound scales as $T^{3/4}$. Despite significant work on various variants of the Frank-Wolfe method, this bound has remained unchanged for a decade. In this paper we give an efficient projection-free algorithm that guarantees $T^{2/3}$ regret for general online convex optimization with smooth cost functions and one linear optimization computation per iteration. As opposed to previous Frank-Wolfe approaches, our algorithm is derived using the Follow-the-Perturbed-Leader method and is analyzed using an online primal-dual framework.
A Sparsity Inducing Nuclear-Norm Estimator (SpINNEr) for Matrix-Variate Regression in Brain Connectivity Analysis
Brzyski, Damian, Hu, Xixi, Goni, Joaquin, Ances, Beau, Randolph, Timothy W., Harezlak, Jaroslaw
For example, it is of clinical interest to understand associations between: (a) alcoholism and the electrical activity of different brain regions over time collected from electroencephalography (EEG) (Li et al., 2010); (b) cognitive function and three-dimensional white-matter structure data collected from diffusion tensor imaging (DTI) (Goldsmith et al., 2014) for patients with multiple sclerosis (MS); and (c) cognitive impairment and brain's metabolic activity data collected from three-dimensional positron emission tomography (PET) imaging (Wang et al., 2014). Our work focuses on the problem of identifying brain network connections that are associated with neurocognitive measures for HIVinfected individuals. The outcome (response) is a continuous variable and the predictors are matrix representations of functional connectivity between the brain's cortical regions. Biophysical considerations motivate our interest in estimating a matrix of regression coefficients that has the following two properties: (i) it should be relatively sparse, since we aim to identify connections that most strongly predict the outcome; and more importantly, (ii) the response-related connections form clusters, since brain activity networks are known to consist of densely connected regions. These two properties translate to the coefficient matrix having relatively small clusters, or blocks of nonzero entries, which implies that it is low-rank. Hence, we aim to solve the matrix regression problem by estimating a coefficient matrix that is both sparse and low-rank. To further illustrate our approach, consider the three matrices in Figure 1. The one in the left panel is sparse, but full-rank, the one on the right panel is low-rank, but not sparse, while the one in the middle panel is both low-rank and sparse, which is the structure we are interested in. To find such a solution, we propose a regularization method called SParsity Inducing Nuclear Norm EstimatoR (SpINNEr).
How Does BN Increase Collapsed Neural Network Filters?
Zhou, Sheng, Wang, Xinjiang, Luo, Ping, Feng, Litong, Li, Wenjie, Zhang, Wei
Improving sparsity of deep neural networks (DNNs) is essential for network compression and has drawn much attention. In this work, we disclose a harmful sparsifying process called filter collapse, which is common in DNNs with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU). It occurs even without explicit sparsity-inducing regularizations such as $L_1$. This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result. This phenomenon becomes more prominent when the network is trained with large learning rates (LR) or adaptive LR schedulers, and when the network is finetuned. We analytically prove that the parameters of BN tend to become sparser during SGD updates with high gradient noise and that the sparsifying probability is proportional to the square of learning rate and inversely proportional to the square of the scale parameter of BN. To prevent the undesirable collapsed filters, we propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training. With psBN, we can recover collapsed filters and increase the model performance in various tasks such as classification on CIFAR-10 and object detection on MS-COCO2017.
Multi-objective Neural Architecture Search via Non-stationary Policy Gradient
Chen, Zewei, Zhou, Fengwei, Trimponias, George, Li, Zhenguo
Multi-objective Neural Architecture Search (NAS) aims to discover novel architectures in the presence of multiple conflicting objectives. Despite recent progress, the problem of approximating the full Pareto front accurately and efficiently remains challenging. In this work, we explore the novel reinforcement learning (RL) based paradigm of non-stationary policy gradient (NPG). NPG utilizes a non-stationary reward function, and encourages a continuous adaptation of the policy to capture the entire Pareto front efficiently. We introduce two novel reward functions with elements from the dominant paradigms of scalarization and evolution. To handle non-stationarity, we propose a new exploration scheme using cosine temperature decay with warm restarts. For fast and accurate architecture evaluation, we introduce a novel pre-trained shared model that we continuously fine-tune throughout training. Our extensive experimental study with various datasets shows that our framework can approximate the full Pareto front well at fast speeds. Moreover, our discovered cells can achieve supreme predictive performance compared to other multi-objective NAS methods, and other single-objective NAS methods at similar network sizes. Our work demonstrates the potential of NPG as a simple, efficient, and effective paradigm for multi-objective NAS.