randomized learning
A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent
We study the generalization error of randomized learning algorithms -- focusing on stochastic gradient descent (SGD) -- using a novel combination of PAC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data.
Reviews: A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent
The paper opens the way to a new use of PAC-Bayesian theory, by combining PAC-Bayes with algorithmic stability to study stochastic optimization algorithms. The obtained probabilistic bounds are then used to inspire adaptive sampling strategies, studied empirically in a deep learning scenario. The paper is well written, and the proofs are non-trivial. It contains several clever ideas, namely the use of algorithmic stability to bound the complexity term inside PAC-Bayesian bounds. It's also fruitful to express the prior and posterior distributions over the sequences of indexes used by a stochastic gradient descent algorithm.
A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent
We study the generalization error of randomized learning algorithms -- focusing on stochastic gradient descent (SGD) -- using a novel combination of PAC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data. We analyze this algorithm in the context of our generalization bounds and evaluate it on a benchmark dataset. Our experiments demonstrate that adaptive sampling can reduce empirical risk faster than uniform sampling while also improving out-of-sample accuracy. Papers published at the Neural Information Processing Systems Conference.
A Constructive Approach for Data-Driven Randomized Learning of Feedforward Neural Networks
Feedforward neural networks with random hidden nodes suffer from a problem with the generation of random weights and biases as these are difficult to set optimally to obtain a good projection space. Typically, random parameters are drawn from an interval which is fixed before or adapted during the learning process. Due to the different functions of the weights and biases, selecting them both from the same interval is not a good idea. Recently more sophisticated methods of random parameters generation have been developed, such as the data-driven method proposed in \cite{Anon19}, where the sigmoids are placed in randomly selected regions of the input space and then their slopes are adjusted to the local fluctuations of the target function. In this work, we propose an extended version of this method, which constructs iteratively the network architecture. This method successively generates new hidden nodes and accepts them if the training error decreases significantly. The threshold of acceptance is adapted to the current training stage. At the beginning of the training process only those nodes which lead to the largest error reduction are accepted. Then, the threshold is reduced by half to accept those nodes which model the target function details more accurately. This leads to faster convergence and more compact network architecture, as it includes only "significant" neurons. Several application examples are given which confirm this thesis.
Improving Randomized Learning of Feedforward Neural Networks by Appropriate Generation of Random Parameters
In this work, a method of random parameters generation for randomized learning of a single-hidden-layer feedforward neural network is proposed. The method firstly, randomly selects the slope angles of the hidden neurons activation functions from an interval adjusted to the target function, then randomly rotates the activation functions, and finally distributes them across the input space. For complex target functions the proposed method gives better results than the approach commonly used in practice, where the random parameters are selected from the fixed interval. This is because it introduces the steepest fragments of the activation functions into the input hypercube, avoiding their saturation fragments. Keywords: Function approximation · Feedforward neural networks · Neural networks with random hidden nodes · Randomized learning algorithms. 1 Introduction Feedforward neural networks (FNNs) learn from data by iteratively tuning their parameters, weights and biases, using some form of gradient descent method.