In today's post, we will discuss one of the most common problems that arise during the training of deep neural networks. It is called overfitting, and it usually occurs when we increase the complexity of the network. In this post, you will learn the most common techniques to reduce overfitting while training neural networks. When building a neural network our goal is to develop a model that performs well on the training dataset, but also on the new data that it wasn't trained on. However, when our model is too complex, sometimes it can start to learn the irrelevant information in the dataset. That means that model memorizes the noise that is closely related only to the training dataset.
I'll be talking about various techniques that can be used to handle overfitting and underfitting in this article. I'll briefly discuss underfitting and overfitting, followed by the discussion about the techniques for handling them. In one of my earlier articles, I talked about the bias-variance trade-off. We talked about the bias-variance relation to model complexity and how underfitting and overfitting looks like. I would encourage you to read the article if you don't understand these terms: For a quick recap let us look at the following figure.
Deep neural networks deal with a multitude of parameters for training and testing. With the increase in the number of parameters, neural networks have the freedom to fit multiple types of datasets which is what makes them so powerful. But, sometimes this power is what makes the neural network weak. The networks often lose control over the learning process and the model tries to memorize each of the data points causing it to perform well on training data but poorly on the test dataset. Overfitting occurs when the model tries to make predictions on data that is very noisy.
In this article, we will discuss regularization and optimization techniques that are used by programmers to build a more robust and generalized neural network. We will study the most effective regularization techniques like L1, L2, Early Stopping, and Drop out which help for model generalization. We will take a deeper look at different optimization techniques like Batch Gradient Descent, Stochastic Gradient Descent, AdaGrad, and AdaDelta for better convergence of the neural networks. Overfitting and underfitting are the most common problems that programmers face while working with deep learning models. A model that is well generalized to data is considered to be an optimal fit for the data.
The objective of a neural network is to have a final model that performs well both on the data that we used to train it (e.g. the training dataset) and the new data on which the model will be used to make predictions. The central challenge in machine learning is that we must perform well on new, previously unseen inputs -- not just those on which our model was trained. The ability to perform well on previously unobserved inputs is called generalization.