Is it possible to scale the activation function instead of batch-normalization? • r/MachineLearning

Open in new window