How to chose an activation function for your network

Jan-4-2019, 03:03:50 GMT–#artificialintelligence

This is the third post in the optimization series, where we are trying to give the reader a comprehensive review of optimization in deep learning. Mini Batch Gradient Descent is used to combat local minima, and saddle points. How adaptive methods like Momentum, RMSProp and Adam, augment vanilla Gradient Descent to address the problem of pathological curvature. Neural networks, unlike the machine learning methods that came before it do not rest upon any probabilistic or statistical assumptions about the data they are fed. However, one of the most, if not the most important element required to ensure that neural networks learn properly is that the data fed to the layers of a neural network exhibit certain properties. In this article, we will cover problems No. 1 and 2, and how activation functions are used to address them.

artificial intelligence, deep learning, machine learning, (19 more...)

#artificialintelligence

Jan-4-2019, 03:03:50 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.55)
  - Neural Networks > Deep Learning (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found