When we are programming Logistic Regression or Neural Networks we should avoid explicit \(for \) loops. It's not always possible, but when we can, we should use built-in functions or find some other ways to compute it. Vectorizing the implementation of Logistic Regression makes the code highly efficient. In this post we will see how we can use this technique to compute gradient descent without using even a single \(for \) loop. This code was non-vectorized and highly inefficent so we need to transform it.

More generally, if we have not just one filter but multiple filters, then it's as if we have not just one unit but multiple units that are taking as inputs all the numbers in one slice, and then building them up into an output there the \(6\times 6\times number \enspace of \enspace filters \). One way to think about the \(1\times 1 \) convolution is that it is basically like having a fully connected neural network that applies to each of the \(36 \) different positions. What this fully connected neural network does, it has a \(32 \) dimensional input whereas the number of outputs equals the number of \(1\times 1 \) filters applied. Doing this every \(36 \) positions we end up with an output that is \(6\times 6 \times number \enspace of \enspace filters \). This can carry out a pretty non-trivial computation on our input volume.

We're going to take this \(400 \) units and build the next layer with \(120 \) units. So, this is actually our first \(Fully \enspace connected \) layer. In this layer we have \(400 \) units densely connected to \(120 \) units. This \(Fully \enspace connected \) layer is like the single neural network layer. Hence, this is just a standard neural network where you have a weight matrix that's called \(W \) of a dimension \(120 \times 400 \).

Let's go through one example that illustrates why \(ResNets \) work so well, at least in the sense of how we can make them deeper and deeper without really hurting our ability to get them to do well on the training set. Hopefully, doing well on the training set is usually a prerequisite to doing well on the test set. So, being able to at least train \(ResNet \) to do well on the training set is a good first step towards that. In the last post we saw that if we make a network deeper, it can decrease our ability to train the network well on the training set. Therefore, we sometimes avoid having too deep neural networks.