Stable and Robust Deep Learning By Hyperbolic Tangent Exponential Linear Unit (TeLU)
Fernandez, Alfredo, Mali, Ankur
–arXiv.org Artificial Intelligence
In the rapidly evolving landscape of neural networks, the choice of activation function plays a pivotal role in model performance and stability. While the Rectified Linear Unit (ReLU) [6, 20] has long been the cornerstone of numerous deep learning architectures [25, 8, 26] due to its simplicity and effectiveness in mitigating the vanishing gradient problem [10, 11], it is not without limitations. Particularly, ReLU suffers from the "dying ReLU" issue [18], where neurons can become inactive and cease to contribute to the learning process, potentially leading to suboptimal models. Enter the Gaussian Error Linear Unit (GELU) [9] and Mish [19] activation functions, which have emerged as sophisticated alternatives, addressing some of ReLU's shortcomings. GELU, leveraging the properties of the Gaussian distribution, offers a smooth, non-linear transition in its activation, which can lead to improved learning dynamics [27, 4, 15]. Mish, further building on this concept, introduces a self-gating mechanism, enabling a smoother information flow.
arXiv.org Artificial Intelligence
Feb-5-2024
- Country:
- Europe (0.28)
- North America > United States
- Florida (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Technology: