Ensemble Neural Networks (ENN): A gradient-free stochastic method
Chena, Yuntian, Changa, Haibin, Jina, Meng, Zhanga, Dongxiao
Abstract: In this study, an efficient stochastic gradient - free method, the ensemble neural networks (ENN), is developed. In the ENN, the optimization process relies on covariance matrices rather than derivatives. The covariance matrices are calculated by the ensemb le randomized maximum likelihood algorithm (EnRML), which is an inverse modeling method. The ENN is able to simultaneously provide estimations and perform uncertainty quantification since it is built under the Bayesian framework. The ENN is also robust to small training data size because the ensemble of stochastic realizations essentially enlarges the training dataset. This constitutes a desirable characteristic, especially for real - world engineering applications. In addition, the ENN does not require the c alculation of gradients, which enables the use of complicated neuron models and loss functions in neural networks. We experimentally demonstrate benefits of the proposed model, in particular showing that the ENN performs much better than the traditional Ba yesian neural networks (BNN). The EnRML in ENN is a substitution of gradient - based optimization algorithms, which means that it can be directly combined with the feed - forward process in other existing (deep) neural networks, such as convolutional neural ne tworks (CNN) and recurrent neural networks (RNN), broadening future applications of the ENN. Keywords: Inverse modeling, Gradient - free, Uncertainty quantification, Robust to small d ata size, Stochastic method 1. Introduction Artificial neural networks (ANN) are computing systems inspired by biological neural networks that constitute animal brains. ANN is capable of approximating nonlinear functional relationships between input and output variables (Kim et al., 2018). From a ma thematical perspective, a neural network can model any function up to any given precision with a sufficiently large number of basis functions (Cybenko, 1989; Hornik, 1991). In addition, we can even use much smaller models by constructing hierarchy neural n etworks (Delalleau & Bengio, 2011; Gal, 2016). The basic processing elements of neural networks are neurons. A collection of neurons is referred to as a layer, and the collection of interconnected layers forms the neural networks (Kim et al., 2018). A four - layer neural network is illustrated in Figure 1 as an example. In a neuron, the output is calculated by a nonlinear function of the sum of its inputs. The connections between different neurons from adjacent layers are represented by the weights in a model. The weights adjust as learning proceeds, and they represent the strength of the signal at a connection. The nonlinear function is also called the activation function, and the most popular choices are sigmoid, tansig, and ReLU (Li et al., 2015). 2 ANN has bee n widely applied to solving real - world engineering problems, and the following three topics are significant for effective applications .
Aug-2-2019
- Country:
- Asia > China (0.14)
- North America > United States
- California (0.14)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- Genre:
- Research Report > New Finding (0.34)