04f8311e7e22eac15d67fe45c242ead8-Supplemental-Conference.pdf

Neural Information Processing Systems 

Let qu(θ) set as Eq. For notational simplicity, let θ0 = θ(t 1). B.1 Hyperparameter settings Training In Table 2, we enumerate the hyperparameters used for our results in Section 5. Since we use expert trajectories for all methods to train the Bayesian pseudocoresets, we refer to hyperparameters related to expert trajectories, such as the number of SGD steps or the maximum random starting points, described in [8]. We found that a slightly shorter expert training step is better for BPC-fKL, so we used an expert step 1 epoch shorter than BPC-W. For each setting, we used the best learning rate from a hyperparameter sweep over {0.01,0.02,0.03,0.04}.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found