
The error is calculated on one and only on a single data point. Divided by N is excluded because of a single data point. This gradient descent will have more accuracy than stochastic gradient descent as stochastic gradient descent use only one data point for error calculation.