dual view
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
We develop a general duality between neural networks and compositional kernel Hilbert spaces. We introduce the notion of a computation skeleton, an acyclic graph that succinctly describes both a family of neural networks and a kernel space. Random neural networks are generated from a skeleton through node replication followed by sampling from a normal distribution to assign weights. The kernel space consists of functions that arise by compositions, averaging, and non-linear transformations governed by the skeleton's graph topology and activation functions. We prove that random networks induce representations which approximate the kernel space. In particular, it follows that random weight initialization often yields a favorable starting point for optimization despite the worst-case intractability of training neural networks.
Reviews: Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
In fact, there is almost no discussion at all about the implication of the results of the paper. The notions of "computational skeleton" and "realisation of a skeleton" seem relatively interesting and aim at generalising the kernel construction proposed in [13] and [29]. This "construction" is in my view the main contribution of the paper. On a theoretical point of view and from my understanding, the main results of the paper are Thm 3. and 4. However, these results are not "so surprising", since, as confirmed by the authors, they can be interpreted as a kind of "law of large number": the random activation of the network is "counterbalanced" by the replication of the skeleton.
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
Daniely, Amit, Frostig, Roy, Singer, Yoram
We develop a general duality between neural networks and compositional kernel Hilbert spaces. We introduce the notion of a computation skeleton, an acyclic graph that succinctly describes both a family of neural networks and a kernel space. Random neural networks are generated from a skeleton through node replication followed by sampling from a normal distribution to assign weights. The kernel space consists of functions that arise by compositions, averaging, and non-linear transformations governed by the skeleton's graph topology and activation functions. We prove that random networks induce representations which approximate the kernel space.