
Hyper-parameters are described as follows. We show experiment results on CIFAR-10 with various learning rate setting in table on the right side. It shows that our learning rate setting is optimal. We show the training curve of our method. DRL at inference is tiny comparing to convolutional layers.