Orthogonal-Pad\'e Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks

Biswas, Koushik, Banerjee, Shilpak, Pandey, Ashish Kumar

Jun-17-2021–arXiv.org Artificial Intelligence

Deep networks are constructed with multiple hidden layers and neurons. Non-linearity is introduced in the network via activation function in each neuron. ReLU [1] is proposed by Nair and Hinton and is the favourite activation in the deep learning community due to its simplicity. Though ReLU has a drawback called dying ReLU, and in this case, up to 50% neurons can be dead due to vanishing gradient problem, i.e. there are numerous neurons which has no impact on the network performance. To overcome this problem, later Leaky Relu [2], Parametric ReLU [3], ELU [4], Softplus [5] was proposed, and they have improved the network performance though it's still an open problem for researchers to find the best activation function. Recently Swish [6] was found by a group of researchers from Google brain, and they used automated searching technique. Swish has shown some improvement in accuracy over ReLU.

activation function, deep learning, neural network, (13 more...)

arXiv.org Artificial Intelligence

Jun-17-2021

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Israel (0.14)
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States (0.28)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found