Mathematical Optimization: 'simplicity is all you need' • r/MachineLearning

@machinelearnbot 

What does it mean for you to do "y * dy/dx" and "y (dy/dx)2"? I mean, are you doing elementwise operations? Also could you report the magnitude on the plot of both "y" and "dy/dx"? My guess is that the norm of the gradient might be very small and thus y dy/dx 2 and you would basically be doing SGD. Also, I don't think you finetuned well SGD RMSprop and so on as they should also give excellent results on MNIST.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found