ReLU activated Multi-Layer Neural Networks trained with Mixed Integer Linear Programs
Neural Networks typically learn by adjusting weights via nonlinear optimization in a training phase. Often, variants of gradient descent are used. These techniques require some differentiability. Therefore, non-smooth but piecewise linear activation functions like ReLU or the Heaviside function raise the question if techniques of linear and mixed integer linear programming are also suited for network training. Learning to near optimality can be performed with Linear Programs (LP) of exponential size for certain network architectures, see [2].
Aug-19-2020