Optimal Activation Functions for the Random Features Regression Model

Wang, Jianxin, Bento, José

arXiv.org Artificial Intelligence 

The asymptotic mean squared test error and sensitivity of the Random Features Regression model (RFR) have been recently studied. We build on this work and identify in closed-form the family of Activation Functions (AFs) that minimize a combination of the test error and sensitivity of the RFR under different notions of functional parsimony. We find scenarios under which the optimal AFs are linear, saturated linear functions, or expressible in terms of Hermite polynomials. Finally, we show how using optimal AFs impacts well established properties of the RFR model, such as its double descent curve, and the dependency of its optimal regularization parameter on the observation noise level. For many neural network (NN) architectures, the test error does not monotonically increase as a model's complexity increases but can go down with the training error both at low and high complexity levels. This phenomenon, the double descent curve, defies intuition and has motivated new frameworks to explain it. Explanations have been advanced involving linear regression with random covariates (Belkin et al., 2020; Hastie et al., 2022), kernel regression (Belkin et al., 2019b; Liang & Rakhlin, 2020), the neural tangent kernel model (Jacot et al., 2018), and the Random Features Regression (RFR) model (Mei & Montanari, 2022). These frameworks allow queries beyond the generalization power of NNs. For example, they have been used to study networks' robustness properties (Hassani & Javanmard, 2022; Tripuraneni et al., 2021). One aspect within reach and unstudied to this day is finding optimal Activation Functions (AFs) for these models. It is known that AFs affect a network's approximation accuracy and efforts to optimize AFs have been undertaken. Previous work has justified the choice of AFs empirically, e.g., Ramachandran et al. (2017), or provided numerical procedures to learn AF parameters, sometimes jointly with models' parameters, e.g. See Rasamoelina et al. (2020) for commonly used AFs and Appendix C for how AFs have been previously derived. We derive for the first time closed-form optimal AFs such that an explicit objective function involving the asymptotic test error and sensitivity of a model is minimized. Setting aside empirical and principled but numerical methods, all past principled and analytical approaches to design AFs focus on non accuracy related considerations, e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found