swish
Appendix . StochasticAdaptiveActivationFunction
ASH activation function inthe early layer exhibits a small threshold (large percentile) to retain substantial information, whereas ASH in deeper layers exhibits a small comparative percentile to rectify futile information. Supplementary Figure 1 illustrates the training graph of loss values and validation accuracies. Supplementary Figure 6: Samples generated byDCGAN withReLU, Swish, andASHactivation functionsusingcelebAdataset. Supplementary Figure 1 illustrates the generated samples by DCGAN (Radford et al., 2015) with ReLU, Swish, andASH activation functions using celebA dataset (Yangetal.,2015). ASH activation function that rectified top-k% percentile could be modified into various versions.
Appendix . Stochastic Adaptive Activation Function
Intuitively, ASH activation function is the threshold-based activation function rectifying inputs, and we obtained the following properties: Property 1. ASH activation function is parametric. ASH activation function in the early layer exhibits a small threshold (large percentile) to retain substantial information, whereas ASH in deeper layers exhibits a small comparative percentile to rectify futile information. Property 2. ASH activation function provides output concerning the contexts of the input. Supplementary Figure 1 illustrates the training graph of loss values and validation accuracies. In addition, the y-axis indicates the range of (0, 0.8). 3 Appendix D. Classification task Supplementary Figure 1 illustrates the GRAD-CAM (Selvaraju et al., 2017) samples by using ResNet-164 and Dense-Net models with ReLU, Swish, and ASH activation function in the classification task In Supplementary Figure 1 Property 1 is clearly illustrated.
RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function
Yu, Yunrui, Wang, Kafeng, Su, Hang, Zhu, Jun
Despite their widespread success, deep neural networks remain critically vulnerable to adversarial attacks, posing significant risks in safety-sensitive applications. This paper investigates activation functions as a crucial yet underexplored component for enhancing model robustness. We propose a Rademacher Complexity Reduction Activation Function (RCR-AF), a novel activation function designed to improve both generalization and adversarial resilience. RCR-AF uniquely combines the advantages of GELU (including smoothness, gradient stability, and negative information retention) with ReLU's desirable monotonicity, while simultaneously controlling both model sparsity and capacity through built-in clipping mechanisms governed by two hyperparameters, $α$ and $γ$. Our theoretical analysis, grounded in Rademacher complexity, demonstrates that these parameters directly modulate the model's Rademacher complexity, offering a principled approach to enhance robustness. Comprehensive empirical evaluations show that RCR-AF consistently outperforms widely-used alternatives (ReLU, GELU, and Swish) in both clean accuracy under standard training and in adversarial robustness within adversarial training paradigms.
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.46)
Toward Improving fNIRS Classification: A Study on Activation Functions in Deep Neural Architectures
Adeli, Behtom, McLinden, John, Pandey, Pankaj, Shao, Ming, Shahriari, Yalda
Activation functions are critical to the performance of deep neural networks, particularly in domains such as functional near-infrared spectroscopy (fNIRS), where nonlinearity, low signal-to-noise ratio (SNR), and signal variability poses significant challenges to model accuracy. However, the impact of activation functions on deep learning (DL) performance in the fNIRS domain remains underexplored and lacks systematic investigation in the current literature. This study evaluates a range of conventional and field-specific activation functions for fNIRS classification tasks using multiple deep learning architectures, including the domain-specific fNIRSNet, AbsoluteNet, MDNN, and shallowConvNet (as the baseline), all tested on a single dataset recorded during an auditory task. To ensure fair a comparison, all networks were trained and tested using standardized preprocessing and consistent training parameters. The results show that symmetrical activation functions such as Tanh and the Absolute value function Abs(x) can outperform commonly used functions like the Rectified Linear Unit (ReLU), depending on the architecture. Additionally, a focused analysis of the role of symmetry was conducted using a Modified Absolute Function (MAF), with results further supporting the effectiveness of symmetrical activation functions on performance gains. These findings underscore the importance of selecting proper activation functions that align with the signal characteristics of fNIRS data.
- North America > United States > Massachusetts > Middlesex County > Lowell (0.14)
- North America > United States > Tennessee > Shelby County > Memphis (0.04)
- North America > United States > Rhode Island (0.04)
- (4 more...)
Tangma: A Tanh-Guided Activation Function with Learnable Parameters
Activation functions are key to effective backpropagation and expressiveness in deep neural networks. This work introduces Tangma, a new activation function that combines the smooth shape of the hyperbolic tangent with two learnable parameters -- α, which shifts the curve's inflection point to adjust neuron activation, and γ, which adds linearity to preserve weak gradients and improve training stability. Tangma was evaluated on MNIST and CIFAR-10 using custom networks composed of convolutional and linear layers and compared against ReLU, Swish, and GELU. On MNIST, Tangma achieved the highest validation accuracy of 99.09% and the lowest validation loss, demonstrating faster and more stable convergence than the baselines. In CIFAR-10, Tangma reached a top validation accuracy of 78.15%, outperforming all other activation functions while maintaining a competitive training loss. Furthermore, Tangma showed improved training efficiency with lower average epoch runtimes compared to Swish and GELU. These results show that Tangma performs well on standard vision tasks and offers reliable, efficient training. Its learnable design gives more control over activation behavior, which may help larger models learn more consistently in tasks such as image recognition or language modeling.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Virginia (0.04)
SG-Blend: Learning an Interpolation Between Improved Swish and GELU for Robust Neural Representations
Sarkar, Gaurav, Gala, Jay, Tripathi, Subarna
The design of activation functions remains a pivotal component in optimizing deep neural networks. While prevailing choices like Swish and GELU demonstrate considerable efficacy, they often exhibit domain-specific optima. This work introduces SG-Blend, a novel activation function that blends our proposed SSwish, a first-order symmetric variant of Swish and the established GELU through dynamic interpolation. By adaptively blending these constituent functions via learnable parameters, SG-Blend aims to harness their complementary strengths: SSwish's controlled non-monotonicity and symmetry, and GELU's smooth, probabilistic profile, to achieve a more universally robust balance between model expressivity and gradient stability. We conduct comprehensive empirical evaluations across diverse modalities and architectures, showing performance improvements across all considered natural language and computer vision tasks and models. These results, achieved with negligible computational overhead, underscore SG-Blend's potential as a versatile, drop-in replacement that consistently outperforms strong contemporary baselines. The code is available at https://anonymous.4open.science/r/SGBlend-6CBC.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
Deep Learning Activation Functions: Fixed-Shape, Parametric, Adaptive, Stochastic, Miscellaneous, Non-Standard, Ensemble
In the architecture of deep learning models, inspired by biological neurons, activation functions (AFs) play a pivotal role. They significantly influence the performance of artificial neural networks. By modulating the non-linear properties essential for learning complex patterns, AFs are fundamental in both classification and regression tasks. This paper presents a comprehensive review of various types of AFs, including fixed-shape, parametric, adaptive, stochastic/probabilistic, non-standard, and ensemble/combining types. We begin with a systematic taxonomy and detailed classification frameworks that delineates the principal characteristics of AFs and organizes them based on their structural and functional distinctions. Our in-depth analysis covers primary groups such as sigmoid-based, ReLU-based, and ELU-based AFs, discussing their theoretical foundations, mathematical formulations, and specific benefits and limitations in different contexts. We also highlight key attributes of AFs such as output range, monotonicity, and smoothness. Furthermore, we explore miscellaneous AFs that do not conform to these categories but have shown unique advantages in specialized applications. Non-standard AFs are also explored, showcasing cutting-edge variations that challenge traditional paradigms and offer enhanced adaptability and model performance. We examine strategies for combining multiple AFs to leverage complementary properties. The paper concludes with a comparative evaluation of 12 state-of-the-art AFs, using rigorous statistical and experimental methodologies to assess their efficacy. This analysis not only aids practitioners in selecting and designing the most appropriate AFs for their specific deep learning tasks but also encourages continued innovation in AF development within the machine learning community.
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland (0.04)
- Africa > Middle East > Egypt (0.04)
- Research Report (1.00)
- Overview (1.00)
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models
Chen, Yushuo, Tang, Tianyi, Xiang, Erge, Li, Linjiang, Zhao, Wayne Xin, Wang, Jing, Chai, Yunpeng, Wen, Ji-Rong
In real world, large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications. For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work, and numerous optimization algorithms and code libraries have been proposed to improve it. Nonetheless, users still find it challenging to compare the effectiveness of all the above methods and understand the underlying mechanisms. In this work, we perform a detailed coarse-to-fine analysis of the inference performance of various code libraries. To evaluate the overall effectiveness, we examine four usage scenarios within two practical applications. We further provide both theoretical and empirical fine-grained analyses of each module in the Transformer architecture. Our experiments yield comprehensive results that are invaluable for researchers to evaluate code libraries and improve inference strategies.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Germany (0.04)