Reviews: Adding One Neuron Can Eliminate All Bad Local Minima

Neural Information Processing Systems 

This phenomenon is a bit curious and perhaps deserves more elaboration. This, I am afraid, is likely what is going on here (if you drop the separable assumption). The main contribution of this work is to prove that by adding a single exponential function (directly) from input to output and adding a mild l_2 regularizer, the slightly modified, highly nonconvex loss function does not have any non-global local minima. Moreover, all of these local minima actually correspond to the global minima of the original, unmodified nonconvex loss. This surprising result, to the best of my knowledge, is new and of genuine interest.