Training neural networks can be very confusing! How many hidden layers should your network have? Why are your gradients vanishing? In this post, we'll peel the curtain behind some of the more confusing aspects of neural nets, and help you make smart decisions about your neural network architecture. I highly recommend forking this kernel and playing with the different building blocks to hone your intuition.

This is the fifth post (post1, post2, post 3, post 4) in the series that I am writing based on the book First contact with DEEP LEARNING, Practical introduction with Keras. In it I will present an intuitive vision of the main components of the learning process of a neural network and put into practice some of the concepts presented here with an interactive tool called TensorFlow Playground. Remember that a neural network is made up of neurons connected to each other; at the same time, each connection of our neural network is associated with a weight that dictates the importance of this relationship in the neuron when multiplied by the input value. Each neuron has an activation function that defines the output of the neuron. The activation function is used to introduce non-linearity in the modeling capabilities of the network. We have several options for activation functions that we will present in this post. Training our neural network, that is, learning the values of our parameters (weights wij and bj biases) is the most genuine part of Deep Learning and we can see this learning process in a neural network as an iterative process of "going and return" by the layers of neurons. The "going" is a forwardpropagation of the information and the "return" is a backpropagation of the information. The first phase forwardpropagation occurs when the network is exposed to the training data and these cross the entire neural network for their predictions (labels) to be calculated.

In this post, I am going to write about the general blueprint to be followed for any deep learning model. Here I am not going in-depth into deep learning concepts but this acts as a basic step that can be followed to develop neural networks. Some steps may be added or can be removed from the below list based on the requirement. The data we get for modeling is most of the time unstructured and raw, where we have lots of data that is not required for our case. The first step comes in modeling a neural network is weight initialization and this is an extremely important step because if the weights are not initialized properly then converging to minima is impossible, but if done is the right way then optimization is achieved in the least time.

Artificial Intelligence, deep learning, machine learning -- whatever you're doing if you don't understand it -- learn it. Because otherwise you're going to be a dinosaur within 3 years. This statement from Mark Cuban might sound drastic – but its message is spot on! We are in middle of a revolution – a revolution caused by Big Huge data and a ton of computational power. For a minute, think how a person would feel in early 20th century if he / she did not understand electricity. You would have been used to doing things in a particular manner for ages and all of a sudden things around you started changing.

Using Deep Neural Networks for regression problems might seem like overkill (and quite often is), but for some cases where you have a significant amount of high dimensional data they can outperform any other ML models. When you learn about Neural Networks you usually start with some image classification problem like the MNIST dataset -- this is an obvious choice as advanced tasks with high dimensional data is where DNNs really thrive. Surprisingly, when you try to apply what you learned on MNIST on a regression tasks you might struggle for a while before your super-advanced DNN model is any better than a basic Random Forest Regressor. In this guide, I listed some key tips and tricks learned while using DNN for regression problems. The data is a set of nearly 50 features describing 25k properties in Warsaw. The code and data source used for this article can be found on GitHub.