Lu, Zhou, Pu, Hongming, Wang, Feicheng, Hu, Zhiqiang, Wang, Liwei

The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. We show a universal approximation theorem for width-bounded ReLU networks: width-(n 4) ReLU networks, where n is the input dimension, are universal approximators.

Lin, Hongzhou, Jegelka, Stefanie

We demonstrate that a very deep ResNet with stacked modules that have one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in d dimensions, i.e. \ell_1(R^d). Due to the identity mapping inherent to ResNets, our network has alternating layers of dimension one and d. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension d [21,11]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture.

Lin, Hongzhou, Jegelka, Stefanie

Recently, deep learning has been playing a central role in machine learning research and applications. Since AlexNet, increasingly more advanced networks have achieved state-of-the-art performance in computer vision, speech recognition, language processing, game playing, medical imaging, and so on. In our previous studies, we proposed quadratic/second-order neurons and deep quadratic neural networks. In a quadratic neuron, the inner product of a vector of data and the corresponding weights in a conventional neuron is replaced with a quadratic function. The resultant second-order neuron enjoys an enhanced expressive capability over the conventional neuron. However, how quadratic neurons improve the expressing capability of a deep quadratic network has not been studied up to now, preferably in relation to that of a conventional neural network. In this paper, we ask three basic questions regarding the expressive capability of a quadratic network: (1) for the one-hidden-layer network structure, is there any function that a quadratic network can approximate much more efficiently than a conventional network? (2) for the same multi-layer network structure, is there any function that can be expressed by a quadratic network but cannot be expressed with conventional neurons in the same structure? (3) Does a quadratic network give a new insight into universal approximation? Our main contributions are the three theorems shedding light upon these three questions and demonstrating the merits of a quadratic network in terms of expressive efficiency, unique capability, and compact architecture respectively.

Here, we report that the depth and the width of a neural network are dual from two perspectives. First, we employ the partially separable representation to determine the width and depth. Second, we use the De Morgan law to guide the conversion between a deep network and a wide network. Furthermore, we suggest the generalized De Morgan law to promote duality to network equivalency.