Differential equations as models of deep neural networks

Ruseckas, Julius

arXiv.org Machine Learning 

In this work we systematically analyze general properties of differential equations used as machine learning models. We demonstrate that the gradient of the loss function with respect to to the hidden state can be considered as a generalized momentum conjugate to the hidden state, allowing application of the tools of classical mechanics. In addition, we show that not only residual networks, but also feedforward neural networks with small nonlinearities and the weights matrices deviating only slightly from identity matrices can be related to the differential equations. We propose a differential equation describing such networks and investigate its properties. 1 Introduction Deep learning is a form of machine learning that uses neural networks with many hidden layers [1, 2]. Deep learning models have dramatically improved speech recognition, visual object recognition, object detection and many other domains [2]. Since the number of layers in deep neural networks become large, it is possible to consider the layer number as a continuous variable [3] and represent the neural network by an differential equation. The connection between the neural networks and differential equations first appeared with an additive model for continuous time recurrent neural networks, described by the differential equations [4] τ i dx i dt x i n null j 1w j,iσ ( x j θ j) I i(t) . Hopfield's work [6] pioneered the analog computation of continuous time recurrent neural networks instead of digital computation using complex numerical algorithms on a digital computer. A Hopfield network has a quadratic form as an Lyapunov function for the activity dynamics. As a consequence, the state of the network evolves to a final state that is a minimum of the Lyapunov function when started in any initial state [7].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found