Minimum width for universal approximation using ReLU networks on compact domain

Kim, Namjun, Min, Chanho, Park, Sejun

arXiv.org Machine Learning 

Understanding what neural networks can or cannot do is a fundamental problem in the expressive power of neural networks. Initial approaches for this problem mostly focus on depth-bounded networks. For example, a line of research studies the size of the two-layer neural network to memorize (i.e., perfectly fit) an arbitrary training dataset and shows that the number of parameters proportional to the dataset size is necessary and sufficient for various activation functions (Baum, 1988; Huang and Babri, 1998). Another important line of works investigates a class of functions that can be approximated by two-layer networks. Classical results in this field represented by the universal approximation theorem show that two-layer networks using a non-polynomial activation function are dense in the space of continuous functions on compact domains (Hornik et al., 1989; Cybenko, 1989; Leshno et al., 1993; Pinkus, 1999). Along with the success of deep learning, the expressive power of deep neural networks has been studied. As in the classical depth-bounded network results, several works have shown that deep neural networks with bounded width can memorize arbitrary training dataset (Yun et al., 2019; Vershynin, 2020) and can approximate any continuous function (Lu et al., 2017; Hanin and Sellke, 2017). Intriguingly, it has also been shown that deeper networks can be more expressive compared to shallow ones. For example, Telgarsky (2016); Eldan and Shamir (2016); Daniely (2017) show that there is a class of functions that can be approximated by deep width-bounded networks with a small number of parameters but cannot be approximated by shallow networks without extremely large width.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found