Why Does Deep Learning Not Have a Local Minimum?