One of the more painful things about training Deep Neural Networks is the large number of hyperparameters you have to deal with. These could be your learning rate α, the discounting factor ρ, and epsilon ε if you are using the RMSprop optimizer (Hinton et al.) or the exponential decay rates β₁ and β₂ if you are using the Adam optimizer (Kingma et al.). You also need to choose the number of layers in the network or the number of hidden units for the layers. You might be using learning rate schedulers and would want to configure those features and a lot more! We definitely need ways to better organize our hyperparameter tuning process. A common algorithm I tend to use to organize my hyperparameter search process is Random Search.
You can easily create learning curves for your deep learning models. First, you must update your call to the fit function to include reference to a validation dataset. This is a portion of the training set not used to fit the model, and is instead used to evaluate the performance of the model during training.
Along the way, I'll share personal commentary, stories from established deep learning practitioners, and code snippets. Let's start by looking at the common points that can fail a neural network. You may one-hot encode to better represent the relationship or you may just keep them as they are. There are two continuous features in the dataset: Age and Bounties. They vary largely in scale, so you would want to standardize their scales. There are several ways to initialize the weights in a neural network. You can start with all zeros (which isn't advisable, and we will see it a second), you can randomly initialize them, or you can choose a technique like Xavier initialization or He initialization. If you go with the Xavier or the He scheme, you need to think about the activation functions accordingly.