Stable and Robust Deep Learning By Hyperbolic Tangent Exponential Linear Unit (TeLU)

Feb-5-2024–arXiv.org Artificial Intelligence

In the rapidly evolving landscape of neural networks, the choice of activation function plays a pivotal role in model performance and stability. While the Rectified Linear Unit (ReLU) [6, 20] has long been the cornerstone of numerous deep learning architectures [25, 8, 26] due to its simplicity and effectiveness in mitigating the vanishing gradient problem [10, 11], it is not without limitations. Particularly, ReLU suffers from the "dying ReLU" issue [18], where neurons can become inactive and cease to contribute to the learning process, potentially leading to suboptimal models. Enter the Gaussian Error Linear Unit (GELU) [9] and Mish [19] activation functions, which have emerged as sophisticated alternatives, addressing some of ReLU's shortcomings. GELU, leveraging the properties of the Gaussian distribution, offers a smooth, non-linear transition in its activation, which can lead to improved learning dynamics [27, 4, 15]. Mish, further building on this concept, introduces a self-gating mechanism, enabling a smoother information flow.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Feb-5-2024

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)
- North America > United States
  - Florida (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)