Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems

Chatziafratis, Vaggos, Nagarajan, Sai Ganesh, Panageas, Ioannis

Mar-2-2020–arXiv.org Machine Learning

Deep Neural Networks (NNs) with many hidden layers are now at the core of modern machine learning applications and can achieve remarkable performance that was previously unattainable using shallow networks. But why are deeper networks better than shallow? Perhaps intuitively, one can understand that the nature of computation done by deep and shallow networks is different; simple one hidden layer NNs extract independent features of the input and return their weighted sum, while deeper NNs can compute features of features, making the features computed by deeper layers no longer independent. Another line of intuition (Poole et al. (2016)), is that highly complicated manifolds in input space can actually turn into flattened manifolds in hidden space, thus helping with downstream tasks (e.g., classification). To make the above intuitions formal and understand the benefits of depth, researchers try to understand the expressivity of NNs and prove depth separation results. Early results in this area sometimes referred to as universality theorems (Cybenko, 1989; Hornik et al., 1989), state that NNs of just one hidden layer, equipped with standard activation units (e.g., sigmoids, ReLUs etc.) are "dense" in the space of continuous functions, meaning that any continuous function can be represented by an appropriate combination of these activation units.

composition, growth rate, oscillation, (16 more...)

arXiv.org Machine Learning

Mar-2-2020

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found