Why Does Deep Learning Not Have a Local Minimum?
Editor's note: This post originally appeared as an answer to a Quora question, which also included the following: "As I understand, the chance of having a derivative zero in each of the thousands of direction is low. Is there some other reason besides this?" Yes, there is a'theoretical justification', and has taken a couple decades to flush it out. I will first point out, however, it has been observed in practice. This was pointed out by LeCun in his early work on LeNet, and is actually discussed in the'orange book', "Pattern Classification" by David G. Stork, Peter E. Hart, and Richard O. Duda. The problem has been addressed in condensed matter physics 20 years ago in the study of spin glasses.
Jun-2-2017, 18:50:08 GMT
- Country:
- Technology: