The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches. Training itself is governed by many hyperparameters.There has been surprisingly little research on design choices for hyper-parameter values and loss-functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. We use small games, to achieve meaningful exploration with moderate computational effort. The experimental results show that training is highly sensitive to hyper-parameter choices. Through multi-objective analysis we identify 4 important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game-episodes, and training epochs. The intuition is that these three increase together as self-play iterations increase, and that increasing them individually is sub-optimal. A consequence of our experiments is a direct recommendation for setting hyper-parameter values in self-play: the overarching outer-loop of self-play iterations should be maximized, in favor of the three inner-loop hyper-parameters, which should be set at lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations.
Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. This measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator. This rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters "help" or "hurt" the network's learning, respectively.
It's hard to be without your favorite shows over the holiday break, but NBC has you covered -- starting today, you can laugh away your winter blues with eight brand new digital episodes of Superstore. Mashable has an exclusive first look at two of the new episodes, which are titled Superstore Training Videos for reasons that will soon become apparent. In the new digital shorts, after watching Cloud 9's horribly outdated corporate training videos from the 1980s, the employees decide to create their own instructional videos. All eight of the training videos will be available today beginning at 11 a.m. And if you haven't caught up with the hilarious workplace comedy yet, you can binge all 21 episodes of the first two seasons that have aired to date on the NBC App, so that you're prepared when Superstore returns with new episodes Thursday, Jan. 5 at 8 p.m. ET/PT.
I am trying to train a VGG-16 model. When I use a single GPU with batch size 80 and learning rate 0.0002, the validation error drops to 35% after 200k iterations. However, when I try to train the same model with the same dataset on 2 GPU's with batch size 40 and learning rate 0.0002, the error after 200k iterations is much higher 45%. So, everything is same in the two settings. The only difference is that I adjust the batch size for the multi-GPU case to keep everything constant.
Imitation Learning is a sequential task where the learner tries to mimic an expert's action in order to achieve the best performance. Several algorithms have been proposed recently for this task. In this project, we aim at proposing a wide review of these algorithms, presenting their main features and comparing them on their performance and their regret bounds.