Goto

Collaborating Authors

 feed-forward layer






Breaking the Activation Function Bottleneck through Adaptive Parameterization

Neural Information Processing Systems

Adaptive parameterization is a means of increasing this flexibility and thereby increasing the model's capacity to learn non-linear patterns. We focus on the feed-forward layer, f(x):= φ(W x+b),for some activation functionφ: R 7 R. Define the pre-activation layer as a = A(x):= Wx+band denote byg(a):= φ(a)/athe activation effect ofφgivena, where divisioniselement-wise.