TanhSoft -- a family of activation functions combining Tanh and Softplus

Biswas, Koushik, Kumar, Sandeep, Banerjee, Shilpak, Pandey, Ashish Kumar

arXiv.org Artificial Intelligence 

Artificial neural networks (ANNs) have occupied the center stage in the realm of deep learning in the recent past. ANNs are made up of several hidden layers, and each hidden layer consists of several neurons. At each neuron, an affine linear map is composed with a nonlinear function known as activation function. During the training of an ANN, the linear map is optimized, however an activation function is usually fixed in the beginning along with the architecture of the ANN. There has been an increasing interest in developing a methodical understanding of activation functions, in particular with regards to the construction of novel activation functions and identifying mathematical properties leading to a better learning [1]. An activation function is considered good if it can increase the learning rate and leaning to better convergence which leads to more accurate results. At the early stage of deep learning research, researchers used shallow networks (fewer hidden layers), and tanh or sigmoid, were used as activation functions. As the research progressed and deeper networks (multiple hidden layers) came into fashion to achieve challenging tasks, Rectified Linear Unit (ReLU)([2], [3], [4]) emerged as the most popular activation function. Despite its simplicity, deep neural networks with ReLU have learned many complex and highly nonlinear functions with high accuracy.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found