rational neural network
max
Let 0 < < 1, 0 < ` < 1, k 1, and r be the Zolotarev sign function Z3k(;`)oftype(3k,3k 1). One finds using the Karush-Kuhn-Tucker conditions [6]thatk1 = = kM = λ. Proof of Lemma 2. Let 0 < < 1 and R: [ 1,1] [ 1,1] be a rational function. Take R(x) = R(2x 1), which is still a rational function. Without loss of generality, we can assume that R is an irreducible rational function (otherwise cancel factors till it is irreducible).
Rational neural networks
We consider neural networks with rational activation functions. The choice of the nonlinear activation function in deep learning architectures is crucial and heavily impacts the performance of a neural network. We establish optimal bounds in terms of network complexity and prove that rational neural networks approximate smooth functions more efficiently than ReLU networks with exponentially smaller depth. The flexibility and smoothness of rational activation functions make them an attractive alternative to ReLU, as we demonstrate with numerical experiments.
$\mathcal{C}^1$-approximation with rational functions and rational neural networks
We show that suitably regular functions can be approximated in the $\mathcal{C}^1$-norm both with rational functions and rational neural networks, including approximation rates with respect to width and depth of the network, and degree of the rational functions. As consequence of our results, we further obtain $\mathcal{C}^1$-approximation results for rational neural networks with the $\text{EQL}^÷$ and ParFam architecture, both of which are important in particular in the context of symbolic regression for physical law learning.
Supplementary Material of Rational neural networks
Finally, we use the identity ReLU( x) = |x | + x 2, x R, to define a rational approximation to the ReLU function on the interval [ 1, 1] as r (x) = 1 2 null xr ( x) 1 + null + x null . Therefore, we have the following inequalities for x [ 1, 1], | ReLU( x) r (x) | = 1 2 null null null null | x| xr ( x) 1 + null null null null null 1 2(1 + null) (||x | xr (x) | + null| x |) null 1 + null null. We now show that ReLU neural networks can approximate rational functions. The structure of the proof closely follows [12, Lemma 1.3]. The statement of Theorem 3 comes in two parts, and we prove them separately.
Review for NeurIPS paper: Rational neural networks
Additional Feedback: This work proposes a new activation function to sever deep learning architecture, providing a theoretical study about its complexity. This paper is well-written and provides a high-level of readability to most readers of the data mining community. However, the article would be significantly enhanced if the issues related to their motivation, technical analysis, and experiments are addressed. Detailed comments are given in the following: 1) Motivation – This paper proposes rational activation function as an alternative to ReLU, potentially avoiding the issue of vanishing gradient problem * The problem raised in this paper, i.e., some existing activation functions (e.g., sigmoid, logistic) can only handle the smooth signal, is a significant problem in deep neural network optimization since their derivative are zero for large value. Low-degree can save time, but is there any better configuration and why choose such type?
Review for NeurIPS paper: Rational neural networks
The paper studies rational DNNs --- deep neural networks where rational functions (of small degrees) are used as non-linearities. The paper provides many interesting theoretical results on the approximation properties of the rational DNNs (specifically, in comparison to ReLU DNNs). The paper also provides two experiments (learning the solution of the 2-dimensional PDE and applications in generative adversarial networks), which are meant to demonstrate that rational activations have advantages compared to other popular activations (ReLu, sine, tanh, polynomial, etc) when used in actual DNN training. The theory presented in the paper establishes that: (1) Consider two problems: (i) Approximating (in the uniform norm) a function implemented with the rational DNNs using ReLU DNNs; and (ii) approximating a function implemented with the ReLU DNNs using rational DNNs. Theorem 3 shows that (ii) is much easier than (i): (ii) can be solved to eps-precision with log(log(1 / eps)) many parameters, whereas (i) requires at least log(1 / eps) parameters (exponentially more).