leakyrelu
Minimum Width of Deep Narrow Networks for Universal Approximation
Yang, Xiao-Song, Zhou, Qi, Zhou, Xuan
Determining the minimum width of fully connected neural networks has become a fundamental problem in recent theoretical studies of deep neural networks. In this paper, we study the lower bounds and upper bounds of the minimum width required for fully connected neural networks in order to have universal approximation capability, which is important in network design and training. We show that $w_{min}\leq\max(2d_x+1, d_y)$ also holds true for networks with ELU, SELU activation functions, and the upper bound of this inequality is attained when $d_y=2d_x$, where $d_x$, $d_y$ denote the input and output dimensions, respectively. Besides, we show that $d_x+1\leq w_{min}\leq d_x+d_y$ for networks with LeakyReLU, ELU, CELU, SELU, Softplus activation functions, by proving that ReLU activation function can be approximated by these activation functions. In addition, in the case that the activation function is injective or can be uniformly approximated by a sequence of injective functions (e.g., ReLU), we present a new proof of the inequality $w_{min}\ge d_y+\mathbf{1}_{d_x
- Asia > China > Hubei Province > Wuhan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
text and bibliography following their suggestions
We thank the reviewers for the helpful feedback and the positive assessment of our submission. Reviewer #1, "It is interesting to see if further increase the width of the network (from linear in d to polynomial in d and In the setting of our paper (minimization of the total network size) a large depth is in some sense unavoidable (as e.g. However, in general there is of course some trade-off between width and depth. Reviewer #4, "Theorem 5.1 extends the approximation results to all piece-wise linear activation functions and not just So in theory, this should also apply to max-outs and other variants of ReLUs such as Leaky ReLUs?" That's right, all these functions are easily expressible one via another using just linear operations ( Reviewer #4, "I fail to see some intuitions regarding the typical values of r, d, and H for the networks used in practice. T. Poggio et al., Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review.
OctField: Hierarchical Implicit Functions for 3D Modeling - Supplemental Material - Jia-Heng T ang
In this supplemental material, we provide more details on network architecture and more visualization results, including shape reconstruction/comparison, shape Generation, and shape Interpolations. Furthermore, some results on scene reconstruction and comparison with Local Implicit Grid [3] are presented to demonstrate our superiority on large data representation thanks to the hierarchical tree structure of our proposed OctField representation. All sections are listed as follows: Section 1 provides the details of network architecture and training. Section 2, Section 3 and Section 4 provide more visualization results on a number of 3D modeling tasks, including shape reconstruction, generation and interpolation. Section 5 conducts four ablation studies, including with or without overlapping of adjacent octants, the training strategy, the distinction of latent codes and the subdivision parameter τ .
Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space Supplementary Material
This supplementary material is presented in a format parallel to the main paper. The section numbers and titles are consistent with the main paper. Similarly, the Theorem numbers are consistent with the main paper, but we also have several additional theorems and lemmas which were not included in the main paper. We start with the set of assumptions based on which our theory is developed. The input domains X and W are compact.