Orthogonal-Pad\'e Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks
Biswas, Koushik, Banerjee, Shilpak, Pandey, Ashish Kumar
–arXiv.org Artificial Intelligence
Deep networks are constructed with multiple hidden layers and neurons. Non-linearity is introduced in the network via activation function in each neuron. ReLU [1] is proposed by Nair and Hinton and is the favourite activation in the deep learning community due to its simplicity. Though ReLU has a drawback called dying ReLU, and in this case, up to 50% neurons can be dead due to vanishing gradient problem, i.e. there are numerous neurons which has no impact on the network performance. To overcome this problem, later Leaky Relu [2], Parametric ReLU [3], ELU [4], Softplus [5] was proposed, and they have improved the network performance though it's still an open problem for researchers to find the best activation function. Recently Swish [6] was found by a group of researchers from Google brain, and they used automated searching technique. Swish has shown some improvement in accuracy over ReLU.
arXiv.org Artificial Intelligence
Jun-17-2021
- Country:
- Asia > Middle East
- Israel (0.14)
- North America
- Canada > Ontario
- Toronto (0.14)
- United States (0.28)
- Canada > Ontario
- Asia > Middle East
- Genre:
- Research Report (0.50)
- Technology: