Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

Gai, Kuo, Zhang, Shihua

arXiv.org Machine Learning 

In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find a analytical solution of a three-layer network with a matrix exponential activation function, i.e., f(X) = W Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,Y = W X. Deep neural networks have become successful in many fields, including computer vision, natural language processing, bioinformatics, etc. However, the mathematical principle of deep learning is still not fully understood, especially why deeper networks with non-linear activation functions tend to be more powerful than shallower ones. It is well known that sufficient large depth-2 neural networks with reasonable activation functions can approximate any continuous function on a bounded domain (Cybenko, 1989; Funahashi, 1989; Hornik et al., 1989; Barron, 1994; Pinkus, 1999), but this requires the width of networks to be exponential. Recent authors have shown that some functions can be approximated by deeper networks with fewer neurons than by shallower ones, such as radial functions (Eldan & Shamir, 2016), Boolean circuit (Rossman et al., 2015) or functions induced by neural network (Telgarsky, 2016).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found