Goto

Collaborating Authors

 linear region


Effects of Data Geometry in Early Deep Learning

Neural Information Processing Systems

Deep neural networks can approximate functions on different types of data, from images to graphs, with varied underlying structure. This underlying structure can be viewed as the geometry of the data manifold. By extending recent advances in the theoretical understanding of neural networks, we study how a randomly initialized neural network with piecewise linear activation splits the data manifold into regions where the neural network behaves as a linear function. We derive bounds on the density of boundary of linear regions and the distance to these boundaries on the data manifold. This leads to insights into the expressivity of randomly initialized deep neural networks on non-Euclidean data sets. We empirically corroborate our theoretical results using a toy supervised learning problem. Our experiments demonstrate that number of linear regions varies across manifolds and the results hold with changing neural network architectures. We further demonstrate how the complexity of linear regions is different on the low dimensional manifold of images as compared to the Euclidean space, using the MetFaces dataset.


Understanding the Evolution of Linear Regions in Deep Reinforcement Learning

Neural Information Processing Systems

Policies produced by deep reinforcement learning are typically characterised by their learning curves, but they remain poorly understood in many other respects. ReLU-based policies result in a partitioning of the input space into piecewise linear regions. We seek to understand how observed region counts and their densities evolve during deep reinforcement learning using empirical results that span a range of continuous control tasks and policy network dimensions. Intuitively, we may expect that during training, the region density increases in the areas that are frequently visited by the policy, thereby affording fine-grained control. We use recent theoretical and empirical results for the linear regions induced by neural networks in supervised learning settings for grounding and comparison of our results. Empirically, we find that the region density increases only moderately throughout training, as measured along fixed trajectories coming from the final policy. However, the trajectories themselves also increase in length during training, and thus the region densities decrease as seen from the perspective of the current trajectory. Our findings suggest that the complexity of deep reinforcement learning policies does not principally emerge from a significant growth in the complexity of functions observed on-and-around trajectories of the policy.


Complexity of One-Dimensional ReLU DNNs

Kogan, Jonathan, Jananthan, Hayden, Kepner, Jeremy

arXiv.org Machine Learning

Abstract--We study the expressivity of one-dimensional (1D) ReLU deep neural networks through the lens of their linear regions. We also propose a function-adaptive notion of sparsity that compares the expected regions used by the network to the minimal number needed to approximate a target within a fixed tolerance. Deep Neural Networks (DNNs) with Rectified Linear Unit (ReLU) activation functions are piecewise-linear functions whose expressive power can be studied via the number of linear regions that they create [1]-[3]. However, achieving such approximations for complicated functions typically demands substantial computational resources. The Lottery Ticket Hypothesis states that we can often remove many connections while maintaining similar performance, motivating the study of sparse DNNs [7].


Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes

Matt Jordan, Justin Lewis, Alexandros G. Dimakis

Neural Information Processing Systems

We relate the problem of computing pointwise robustness of these networks to that of computing the maximum norm ball with a fixed center that can be contained in a non-convex polytope. This is a challenging problem in general, however we show that there exists an efficient algorithm to compute this for polyhedral complices.



Is Deeper Better only when Shallow is Good?

Eran Malach, Shai Shalev-Shwartz

Neural Information Processing Systems

While current works account for the importance of depth for the expressive power of neural-networks, it remains an open question whether these benefits are exploited during a gradient-based optimization process.




Appendix

Neural Information Processing Systems

The appendix is organized as follows. Appendix A Proofs related to activation patterns and activation regions. Appendix B Proofs related to the numbers of regions attained with positive probability. Appendix D Proofs related to the expected volume of activation regions. Appendix E Proofs related to the expected number of activation regions.


On the Expected Complexity of Maxout Networks

Neural Information Processing Systems

Learning with neural networks relies on the complexity of the representable functions, but more importantly, the particular assignment of typical parameters to functions of different complexity.