Improving Vanilla Gradient Descent – Towards Data Science
When we train neural networks with gradient descent, we risk the network falling into local minima, in which the network stops somewhere along the error surface that is not the lowest point on the overall surface. This is because the error surfaces are not inherently convex, so the surface may contain many independent local minima separate from the global minimum. Additionally, while the network may reach a global minimum and converge to a desirable point for the training data, there is no guarantee as to how well it will generalize what it has learned. This means that they are prone to overfitting on the training data. There are several things that we may use in order to help mitigate these issues, although there is no way to definitively prevent them from occurring, as the error surfaces for these networks tend to be quite difficult to traverse, and neural networks as a whole are rather difficult to interpret.
Feb-24-2018, 21:55:28 GMT
- Technology: