Orthogonal-Pad\'e Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks

Biswas, Koushik, Banerjee, Shilpak, Pandey, Ashish Kumar

arXiv.org Artificial Intelligence 

Deep networks are constructed with multiple hidden layers and neurons. Non-linearity is introduced in the network via activation function in each neuron. ReLU [1] is proposed by Nair and Hinton and is the favourite activation in the deep learning community due to its simplicity. Though ReLU has a drawback called dying ReLU, and in this case, up to 50% neurons can be dead due to vanishing gradient problem, i.e. there are numerous neurons which has no impact on the network performance. To overcome this problem, later Leaky Relu [2], Parametric ReLU [3], ELU [4], Softplus [5] was proposed, and they have improved the network performance though it's still an open problem for researchers to find the best activation function. Recently Swish [6] was found by a group of researchers from Google brain, and they used automated searching technique. Swish has shown some improvement in accuracy over ReLU.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found