arxivpreprint
Breaking the Activation Function Bottleneck through Adaptive Parameterization
Sebastian Flennerhag, Hujun Yin, John Keane, Mark Elliot
Adaptive parameterization is a means of increasing this flexibility and thereby increasing the model's capacity to learn non-linear patterns. We focus on the feed-forward layer, f(x):= ฯ(W x+b),for some activation functionฯ: R 7 R. Define the pre-activation layer as a = A(x):= Wx+band denote byg(a):= ฯ(a)/athe activation effect ofฯgivena, where divisioniselement-wise.