Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems
Chatziafratis, Vaggos, Nagarajan, Sai Ganesh, Panageas, Ioannis
Deep Neural Networks (NNs) with many hidden layers are now at the core of modern machine learning applications and can achieve remarkable performance that was previously unattainable using shallow networks. But why are deeper networks better than shallow? Perhaps intuitively, one can understand that the nature of computation done by deep and shallow networks is different; simple one hidden layer NNs extract independent features of the input and return their weighted sum, while deeper NNs can compute features of features, making the features computed by deeper layers no longer independent. Another line of intuition (Poole et al. (2016)), is that highly complicated manifolds in input space can actually turn into flattened manifolds in hidden space, thus helping with downstream tasks (e.g., classification). To make the above intuitions formal and understand the benefits of depth, researchers try to understand the expressivity of NNs and prove depth separation results. Early results in this area sometimes referred to as universality theorems (Cybenko, 1989; Hornik et al., 1989), state that NNs of just one hidden layer, equipped with standard activation units (e.g., sigmoids, ReLUs etc.) are "dense" in the space of continuous functions, meaning that any continuous function can be represented by an appropriate combination of these activation units.
Mar-2-2020
- Country:
- Asia > Singapore (0.04)
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Genre:
- Research Report (0.50)
- Technology: