Country
Minor Constraint Disturbances for Deep Semi-supervised Learning
Chu, Jielei, Liu, Jing, Wang, Hongjun, Gong, Zhiguo, Li, Tianrui
In high-dimensional data space, semi-supervised feature learning based on Euclidean distance shows instability under a broad set of conditions. Furthermore, the scarcity and high cost of labels prompt us to explore new semi-supervised learning methods with the fewest labels. In this paper, we develop a novel Minor Constraint Disturbances-based Deep Semi-supervised Feature Learning framework (MCD-DSFL) from the perspective of probability distribution for feature representation. There are two fundamental modules in the proposed framework: one is a Minor Constraint Disturbances-based restricted Boltzmann machine with Gaussian visible units (MCDGRBM) for modelling continuous data and the other is a Minor Constraint Disturbances-based restricted Boltzmann machine (MCDRBM) for modelling binary data. The Minor Constraint Disturbances (MCD) consist of less instance-level constraints which are produced by only two randomly selected labels from each class. The Kullback-Leibler (KL) divergences of the MCD are fused into the Contrastive Divergence (CD) learning for training the proposed MCDGRBM and MCDRBM models. Then, the probability distributions of hidden layer features are as similar as possible in the same class and they are as dissimilar as possible in the different classes simultaneously. Despite the weak influence of the MCD for our shallow models (MCDGRBM and MCDRBM), the proposed deep MCD-DSFL framework improves the representation capability significantly under its leverage effect. The semi-supervised strategy based on the KL divergence of the MCD significantly reduces the reliance on the labels and improves the stability of the semi-supervised feature learning in high-dimensional space simultaneously.
Estimation of Rate Control Parameters for Video Coding Using CNN
Santamaria, Maria, Izquierdo, Ebroul, Blasi, Saverio, Mrak, Marta
Rate-control is essential to ensure efficient video delivery. Typical rate-control algorithms rely on bit allocation strategies, to appropriately distribute bits among frames. As reference frames are essential for exploiting temporal redundancies, intra frames are usually assigned a larger portion of the available bits. In this paper, an accurate method to estimate number of bits and quality of intra frames is proposed, which can be used for bit allocation in a rate-control scheme. The algorithm is based on deep learning, where networks are trained using the original frames as inputs, while distortions and sizes of compressed frames after encoding are used as ground truths. Two approaches are proposed where either local or global distortions are predicted.
Taylor Expansion Policy Optimization
Tang, Yunhao, Valko, Michal, Munos, Rรฉmi
In this work, we investigate the application of Taylor expansions in reinforcement learning. In particular, we propose Taylor expansion policy optimization, a policy optimization formalism that generalizes prior work (e.g., TRPO) as a first-order special case. We also show that Taylor expansions intimately relate to off-policy evaluation. Finally, we show that this new formulation entails modifications which improve the performance of several state-of-the-art distributed algorithms.
What Information Does a ResNet Compress?
Darlow, Luke Nicholas, Storkey, Amos
The information bottleneck principle (Shwartz-Ziv & Tishby, 2017) suggests that SGD-based training of deep neural networks results in optimally compressed hidden layers, from an information theoretic perspective. However, this claim was established on toy data. The goal of the work we present here is to test whether the information bottleneck principle is applicable to a realistic setting using a larger and deeper convolutional architecture, a ResNet model. We trained PixelCNN++ models as inverse representation decoders to measure the mutual information between hidden layers of a ResNet and input image data, when trained for (1) classification and (2) autoencoding. We find that two stages of learning happen for both training regimes, and that compression does occur, even for an autoencoder. Sampling images by conditioning on hidden layers' activations offers an intuitive visualisation to understand what a ResNets learns to forget.
Identification of AC Networks via Online Learning
Fabbiani, Emanuele, Nahata, Pulkit, De Nicolao, Giuseppe, Ferrari-Trecate, Giancarlo
With the advent of renewable energy resources, generation in power networks is drifting from the classical centralized paradigm to an increasingly distributed scenario. While offering many advantages, renewable-based generation can compromise grid reliability, due to its intermittent nature and creation of reverse power flows. In order to guarantee the safe operation of power systems and avoid dangerous phenomena like blackouts, innovative and efficient control algorithms are necessary. Nevertheless, advanced algorithms necessitate grid identification, that is, the knowledge of grid topology and line parameters. Most works on the identification of electric networks focus on topology verification, assuming a known initial topology and aiming at detecting sparse changes, such as line trips or switch activations [1, 2]. More recently, attention has shifted to the estimation of network topology and line parameters without any apriori information. Two main branches of research have appeared. On the one hand, works like [3, 4] propose learning algorithms that exploit the statistical properties of nodal measurements to determine the operational structure and the line impedances. These approaches have the major advantage of accounting for buses with no available measurements (hidden nodes) [4], although restrictive assumptions are required, e.g.
On the effectiveness of convolutional autoencoders on image-based personalized recommender systems
Blanco-Mallo, E., Remeseiro, B., Bolรณn-Canedo, V., Alonso-Betanzos, A.
Recommender systems (RS) are increasingly present in our daily lives, especially since the advent of Big Data, which allows for storing all kinds of information about users' preferences. Personalized RS are successfully applied in platforms such as Netflix, Amazon or YouTube. However, they are missing in gastronomic platforms such as TripAdvisor, where moreover we can find millions of images tagged with users' tastes. This paper explores the potential of using those images as sources of information for modeling users' tastes and proposes an image-based classification system to obtain personalized recommendations, using a convolutional autoencoder as feature extractor. The proposed architecture will be applied to TripAdvisor data, using users' reviews that can be defined as a triad composed by a user, a restaurant, and an image of it taken by the user. Since the dataset is highly unbalanced, the use of data augmentation on the minority class is also considered in the experimentation. Results on data from three cities of different sizes (Santiago de Compostela, Barcelona and New York) demonstrate the effectiveness of using a convolutional autoencoder as feature extractor, instead of the standard deep features computed with convolutional neural networks.
Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study
Dauber, Assaf, Feder, Meir, Koren, Tomer, Livni, Roi
The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms. This notion refers to the tendency of the optimization algorithm towards a certain structured solution that often generalizes well. Recently, several papers have studied implicit regularization and were able to identify this phenomenon in various scenarios. We revisit this paradigm in arguably the simplest non-trivial setup, and study the implicit bias of Stochastic Gradient Descent (SGD) in the context of Stochastic Convex Optimization. As a first step, we provide a simple construction that rules out the existence of a \emph{distribution-independent} implicit regularizer that governs the generalization ability of SGD. We then demonstrate a learning problem that rules out a very general class of \emph{distribution-dependent} implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as non-degenerate norm-based regularizations. Certain aspects of our constructions point out to significant difficulties in providing a comprehensive explanation of an algorithm's generalization performance by solely arguing about its implicit regularization properties.
Ultra Efficient Transfer Learning with Meta Update for Cross Subject EEG Classification
Duan, Tiehang, Chauhan, Mihir, Shaikh, Mohammad Abuzar, Srihari, Sargur
Electroencephalogram (EEG) signal is widely used in brain computer interfaces (BCI), the pattern of which differs significantly across different subjects, and poses a major challenge for real world application of EEG classifiers. We found an efficient transfer learning method, named Meta UPdate Strategy (MUPS), boosts cross subject classification performance of EEG signals, and only need a small amount of data from target subject. The model tackles the problem with a two step process: (1) extract versatile features that are effective across all source subjects, and (2) adapt the model to target subject. The proposed model, which originates from meta learning, aims to find feature representation that is broadly suitable for different subjects, and maximizes sensitivity of the loss function on new subject such that one or a small number of gradient steps can lead to effective adaptation. The method can be applied to all deep learning oriented models. We performed extensive experiments on two public datasets, the proposed MUPS model outperforms current state of the arts by a large margin on accuracy and AUC-ROC when only a small amount of target data is used.
Learning Graph Embedding with Limited Labeled Data: An Efficient Sampling Approach
Li, Qirui, Liu, Xiaoming, Shen, Chao, Peng, Xi, Zhou, Yadong, Guan, Xiaohong
Semi-supervised graph embedding methods represented by graph convolutional network has become one of the most popular methods for utilizing deep learning approaches to process the graph-based data for applications. Mostly existing work focus on designing novel algorithm structure to improve the performance, but ignore one common training problem, i.e., could these methods achieve the same performance with limited labelled data? To tackle this research gap, we propose a sampling-based training framework for semi-supervised graph embedding methods to achieve better performance with smaller training data set. The key idea is to integrate the sampling theory and embedding methods by a pipeline form, which has the following advantages: 1) the sampled training data can maintain more accurate graph characteristics than uniformly chosen data, which eliminates the model deviation; 2) the smaller scale of training data is beneficial to reduce the human resource cost to label them; The extensive experiments show that the sampling-based method can achieve the same performance only with 10$\%$-50$\%$ of the scale of training data. It verifies that the framework could extend the existing semi-supervised methods to the scenarios with the extremely small scale of labelled data.
B-PINNs: Bayesian Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Noisy Data
Yang, Liu, Meng, Xuhui, Karniadakis, George Em
We propose a Bayesian physics-informed neural network (B-PINN) to solve both forward and inverse nonlinear problems described by partial differential equations (PDEs) and noisy data. In this Bayesian framework, the Bayesian neural network (BNN) combined with a PINN for PDEs serves as the prior while the Hamiltonian Monte Carlo (HMC) or the variational inference (VI) could serve as an estimator of the posterior. B-PINNs make use of both physical laws and scattered noisy measurements to provide predictions and quantify the aleatoric uncertainty arising from the noisy data in the Bayesian framework. Compared with PINNs, in addition to uncertainty quantification, B-PINNs obtain more accurate predictions in scenarios with large noise due to their capability of avoiding overfitting. We conduct a systematic comparison between the two different approaches for the B-PINN posterior estimation (i.e., HMC or VI), along with dropout used for quantifying uncertainty in deep neural networks. Our experiments show that HMC is more suitable than VI for the B-PINNs posterior estimation, while dropout employed in PINNs can hardly provide accurate predictions with reasonable uncertainty. Finally, we replace the BNN in the prior with a truncated Karhunen-Lo\`eve (KL) expansion combined with HMC or a deep normalizing flow (DNF) model as posterior estimators. The KL is as accurate as BNN and much faster but this framework cannot be easily extended to high-dimensional problems unlike the BNN based framework.