50 things I learned at NIPS 2016
Why does deep learning work now, but not 20 years ago, even though many of the core ideas were there? In one sentence: We have more data, more compute, better software engineering, and a few algorithmic innovations (many layers, ReLUs, better initialization and learning rates, dropout, LSTMs). But why does gradient-based optimization work at all in neural nets despite the non-convexity? One possible, partial answer is overprovisioning: There are generally many hidden units, and there are many ways a neural net can approximately implement the desired input-output relationship. You only need to find one.
Dec-20-2016, 10:25:17 GMT
- Technology: