identity mapping
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
In recent years, stochastic gradient descent (SGD) based techniques has become the standard tools for training neural networks. However, formal theoretical understanding of why SGD can train neural networks in practice is largely missing. In this paper, we make progress on understanding this mystery by providing a convergence analysis for SGD on a rich subset of two-layer feedforward networks with ReLU activations. This subset is characterized by a special structure called identity mapping. We prove that, if input follows from Gaussian distribution, with standard $O(1/\sqrt{d})$ initialization of the weights, SGD converges to the global minimum in polynomial number of steps.
FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction
Shuyang Sun, Jiangmiao Pang, Jianping Shi, Shuai Yi, Wanli Ouyang
The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e.g., image-level, region-level, and pixellevel, are diverging. Generally, network structures designed specifically for image classification are directly used as default backbone structure for other tasks including detection and segmentation, but there is seldom backbone structure designed under the consideration of unifying the advantages of networks designed for pixellevel or region-level predicting tasks, which may require very deep features with high resolution. Towards this goal, we design a fish-like network, called FishNet. In FishNet, the information of all resolutions is preserved and refined for the final task. Besides, we observe that existing works still cannot directly propagate the gradient information from deep layers to shallow layers. Our design can better handle this problem. Extensive experiments have been conducted to demonstrate the remarkable performance of the FishNet. In particular, on ImageNet-1k, the accuracy of FishNet is able to surpass the performance of DenseNet and ResNet with fewer parameters. FishNet was applied as one of the modules in the winning entry of the COCO Detection 2018 challenge.
FactorGraphNeuralNet--SupplementaryFile AProof of propositions
First we provide Lemma 8, which will be used in the proof of Proposition 2 and 4. Lemma 8. Given n non-negative feature vectors fi =[fi0,fi1,...,fim], where i=1,...,n, there exists n matrices Qi with shape nm m and n vector ˆfi =QifTi, s.t. Proposition 2. A factor graph G =(V,C,E) with variable log potentialsθi(xi) and factor log potentials ϕc(xc) can be converted to a factor graph G0 with the same variable potentials and the decomposed log-potentials ϕic(xi,zc) using a one-layer FGNN. Without loss of generality, we assume that logφc(xc)>1. Then for each i the item θic(xi,zc) in (9) have kn+1 entries, and each entry is either a scaled entry of the vectorgc or arbitrary negative number less than maxxcθc(xc). Thusifweorganize θic(xi,zc) asalength-kn+1 vector fic, thenwedefinea kn+1 kn matrix Qci, where if and only if thelth entry of fic is set to the mth entry of gc multiplied by 12 1/|s(c)|, the entry of Qci in lth row, mth column will be set to 1/|s(c)|; all the other entries of Qci is set to some negative number smaller than maxxcθc(xc).
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
In recent years, stochastic gradient descent (SGD) based techniques has become the standard tools for training neural networks. However, formal theoretical understanding of why SGD can train neural networks in practice is largely missing. In this paper, we make progress on understanding this mystery by providing a convergence analysis for SGD on a rich subset of two-layer feedforward networks with ReLU activations. This subset is characterized by a special structure called identity mapping. We prove that, if input follows from Gaussian distribution, with standard $O(1/\sqrt{d})$ initialization of the weights, SGD converges to the global minimum in polynomial number of steps.