How to Choose an Activation Function
Mhaskar, H. N., Micchelli, C. A..
–Neural Information Processing Systems
In [10], we have shown that such a network using practically any nonlinear activation function can approximate any continuous function of any number of real variables on any compact set to any desired degree of accuracy. A central question in this theory is the following. If one needs to approximate a function from a known class of functions to a prescribed accuracy, how many neurons will be necessary to accomplish this approximation for all functions in the class?
Neural Information Processing Systems
Dec-31-1994