3f2dff7862a70f97a59a1fa02c3ec110-Supplemental.pdf

Neural Information Processing Systems 

Training Procedure All models are written in PyTorch and trained on GPUs. For each scheduler, we train for 10,000 epochs using the Adam optimizer [16] with a learning rate of 10 3, and minibatchsizeof1000. Reward Evaluation To obtain the bandit feedback in Eq. (7), we use a fixed, linear schedule with d = 50 for calculating Lt with Eq. (5). This yields a tighter logpθ(x) bound, decouples reward function evaluation from model training and schedule selection in each round, and is still efficient using SNIS in Eq. (4). Estimating appropriate values for them is critical as this represents the GP's prior regarding the sensitivity of performance w.r.t.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found