Difference between Batch Gradient Descent and Stochastic Gradient Descent
Now, what was the Gradient Descent algorithm? Above algorithm says, to perform the GD, we need to calculate the gradient of the cost function J. And to calculate the gradient of the cost function, we need to sum (yellow circle!) the cost of each sample. If we have 3 million samples, we have to loop through 3 million times or use the dot product. Do you see np.dot(X.T, y_hat-y) above?
Jan-22-2019, 04:38:16 GMT
- Technology: