In the previous section, taught you about the log-loss function. There are many other error functions used for neural networks. Let me teach you another one, called the mean squared error. As the name says, this one is the mean of the squares of the differences between the predictions and the labels. In the following section I'll go over it in detail, then we'll get to implement backpropagation with it on the same student admissions dataset. And as a bonus, we'll be implementing this in a very effective way using matrix multiplication with NumPy! We want to find the weights for our neural networks.

Pu, Yuchen, Gan, Zhe, Henao, Ricardo, Li, Chunyuan, Han, Shaobo, Carin, Lawrence

A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupervised and semi-supervised problems, including semi-supervised analysis of the ImageNet data, demonstrating the scalability of the model to large datasets. Papers published at the Neural Information Processing Systems Conference.

In the end training a network is the solution of a very high dimensional non-linear optimization problem ( finding the minimum of a function). The graphic shows a two dimensional optimization problem and how the gradient descent algorithm aproaches the minimum ( center). In 2D this is trivial, in higher dimensions computationally intensive. The slider sets the step size. You want big steps to find the solution fast, but if the step size gets to big, the optimizer starts to oscilate.