Mathematical Optimization: 'simplicity is all you need' • r/MachineLearning
What does it mean for you to do "y * dy/dx" and "y (dy/dx)2"? I mean, are you doing elementwise operations? Also could you report the magnitude on the plot of both "y" and "dy/dx"? My guess is that the norm of the gradient might be very small and thus y dy/dx 2 and you would basically be doing SGD. Also, I don't think you finetuned well SGD RMSprop and so on as they should also give excellent results on MNIST.
Jan-3-2018, 19:10:57 GMT
- Technology: