Goto

Collaborating Authors

 Mark Elliot



Breaking the Activation Function Bottleneck through Adaptive Parameterization

Neural Information Processing Systems

Standard neural network architectures are non-linear only by virtue of a simple element-wise activation function, making them both brittle and excessively large. In this paper, we consider methods for making the feed-forward layer more flexible while preserving its basic structure. We develop simple drop-in replacements that learn to adapt their parameterization conditional on the input, thereby increasing statistical efficiency significantly.