GPU Memory Usage Optimization for Backward Propagation in Deep Network Training