Improved knowledge distillation by utilizing backward pass knowledge in neural networks