[D] Any work on penalizing classifications for being too accurate? • r/MachineLearning
So I stumbled across a weird effect on mistake when training some CNNs on the MNIST dataset. I had implemented the gradient of the softmax layer incorrectly (I was multiplying it by an additional output * (1 - output)), but the odd thing was that I was getting better testing predictions. So comparing using a gradient on my softmax layer of ex vs ex * ex * (1 - ex), the latter was actually doing a fair bit better on final testing predictions. Which when used basically forces the weights away from classifications that are too accurate, which I imagine does a pretty decent job of preventing overfitting. The first one does train quicker, and does reach much lower error rates, however does a worse job of predictions on the test set.
Dec-20-2017, 05:30:13 GMT
- Technology: