Why Gradient Descent for Optimization?
I have a question regarding the optimization technique used for updating the weights. People generally use gradient descent for the optimization whether its SGD or adaptive. Why can't we use other techniques like Newton Raphson.
Mar-12-2018, 00:30:13 GMT
- Technology: