Sometimes, you see a diagram and it gives you an'aha ha' moment I saw it on Frederick kratzert's blog Using the input variables x and y, The forwardpass (left half of the figure) calculates output z as a function of x and y i.e. f(x,y) The right side of the figures shows the backwardpass. Receiving dL/dz (the derivative of the total loss with respect to the output z), we can calculate the individual gradients of x and y on the loss function by applying the chain rule, as shown in the figure. This post is a part of my forthcoming book on Mathematical foundations of Data Science. The goal of the neural network is to minimise the loss function for the whole network of neurons. Hence, the problem of solving equations represented by the neural network also becomes a problem of minimising the loss function for the entire network.

This article serves as a good exercise to see how forward propagation works and then how the gradients are computed to implement the backpropagation algorithm. Also, the reader will get comfortable with the computation of vector, tensor derivatives and vector/matrix calculus. A useful document can be found here for the interested reader to get familiar with tensor operations.

I prefer Option 2 and take that approach to learning any new topic. I might not be able to tell you the entire math behind an algorithm, but I can tell you the intuition. I can tell you the best scenarios to apply an algorithm based on my experiments and understanding. In my interactions with people, I find that people don't take time to develop this intuition and hence they struggle to apply things in the right manner. In this article, I will discuss the building block of a neural network from scratch and focus more on developing this intuition to apply Neural networks.

Using high-level frameworks like Keras, TensorFlow or PyTorch allows us to build very complex models quickly. However, it is worth taking the time to look inside and understand underlying concepts. Not so long ago I published an article, explaining -- in a simple way -- how neural nets work. However, it was highly theoretical post, dedicated primarily to math, which is the source of NN superpower. From the beginning I was planning to follow-up this topic in a more practical way.

Let's assume we are really into mountain climbing, and to add a little extra challenge, we cover eyes this time so that we can't see where we are and when we accomplished our "objective," that is, reaching the top of the mountain. Since we can't see the path upfront, we let our intuition guide us: assuming that the mountain top is the "highest" point of the mountain, we think that the steepest path leads us to the top most efficiently. We approach this challenge by iteratively "feeling" around you and taking a step into the direction of the steepest ascent -- let's call it "gradient ascent." But what do we do if we reach a point where we can't ascent any further? I.e., each direction leads downwards?