Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

Open in new window