How to chose an activation function for your network

#artificialintelligence 

This is the third post in the optimization series, where we are trying to give the reader a comprehensive review of optimization in deep learning. Mini Batch Gradient Descent is used to combat local minima, and saddle points. How adaptive methods like Momentum, RMSProp and Adam, augment vanilla Gradient Descent to address the problem of pathological curvature. Neural networks, unlike the machine learning methods that came before it do not rest upon any probabilistic or statistical assumptions about the data they are fed. However, one of the most, if not the most important element required to ensure that neural networks learn properly is that the data fed to the layers of a neural network exhibit certain properties. In this article, we will cover problems No. 1 and 2, and how activation functions are used to address them.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found