data-dependent sample complexity
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Existing Rademacher complexity bounds for neural networks rely only on norm control of the weight matrices and depend exponentially on depth via a product of the matrix norms. Lower bounds show that this exponential dependence on depth is unavoidable when no additional properties of the training data are considered. We suspect that this conundrum comes from the fact that these bounds depend on the training data only through the margin. In practice, many data-dependent techniques such as Batchnorm improve the generalization performance. For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers. Our bounds scale polynomially in depth when these empirical quantities are small, as is usually the case in practice. To obtain these bounds, we develop general tools for augmenting a sequence of functions to make their composition Lipschitz and then covering the augmented functions. Inspired by our theory, we directly regularize the network's Jacobians during training and empirically demonstrate that this improves test performance.
Reviews: Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
This paper developed a new generalization error bounds for smooth activation deep neural networks which is norm-based and has only polynomial dependence on the depth unlike (most of) previous work. The bound is tighter than the ever obtained ones in the same direction, and gives new insight on the generalization error analysis for the deep learning. In particular, it connects the deep and shallow network analysis more appropriately than existing researches. The paper would benefit from more intuitive expositions of the bound and more comparisons with existing bounds on real datasets.
Reviews: Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Generalization bounds on neural nets, based on Rademacher complexity, use the norm bounds on weights of layers, which gives an exponential dependence on depth. Moreover, existing lower bounds show that this is unavoidable (in general). The goal of the paper is to get bounds polynomial in depth by additionally using properties of training data. However, such data dependent bounds comes with challenges, discussed in the paper. The authors introduce "augmenting" the loss function with desirable properties and present tools to derive covering bounds on augmented loss. Comments: 1. Data-dependent generalization bounds have recently become popular to derive sharper generalization bounds. This paper contributes to this line of work by considering properties of training data, in particular norms of layers and norms of Jacobians of laters with other layers. They paper presents the (novel) idea of augmenting the loss function with the desirable properties, they then derive generalization bounds on the augmented loss.
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Existing Rademacher complexity bounds for neural networks rely only on norm control of the weight matrices and depend exponentially on depth via a product of the matrix norms. Lower bounds show that this exponential dependence on depth is unavoidable when no additional properties of the training data are considered. We suspect that this conundrum comes from the fact that these bounds depend on the training data only through the margin. In practice, many data-dependent techniques such as Batchnorm improve the generalization performance. For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers.
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Existing Rademacher complexity bounds for neural networks rely only on norm control of the weight matrices and depend exponentially on depth via a product of the matrix norms. Lower bounds show that this exponential dependence on depth is unavoidable when no additional properties of the training data are considered. We suspect that this conundrum comes from the fact that these bounds depend on the training data only through the margin. In practice, many data-dependent techniques such as Batchnorm improve the generalization performance. For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers.