[R] Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting • r/MachineLearning
I'm a little bewildered here. Note, that the softmax is not included in the table for the very simple reason that it gave miserable results on this NN configuration. Softmax Cross Entropy is the de facto output activation in FCNs. They don't specify if that test was with CE error or MSE, but even if it was with MSE (as a later experiment is), that just speaks to the incredibly poorly designed network they used (392-50-10 neurons is truly weird). The idea bears some resemblance to momentum, where we gradually speed things up when the error gradients are consistent.
Jul-14-2017, 02:05:07 GMT
- Technology: