Supplementary Material
–Neural Information Processing Systems
We provide more details of training the teacher network in Section A, more experimental results on synthetic functions in Section B, and the hyperparameter settings for benchmark datasets in Section C. Here, we omit the iteration subscript t for simplicity. To solve Eq. (10), we obtain the hypergradient regarding to and backpropagate it to = {W 2 R As shown in Algorithm 1, we train the teacher network one step when each time it is called by an underperforming student model, where the step refers to one iteration on synthetic functions and one epoch of the validation set on benchmark datasets in the experiment. In Section 4.1, we have shown the experimental results of HPM on two population synthetic functions, i.e., the Branin and Hartmann6D functions. In the following, we will provide more details about synthetic functions and the implementation, as well as more results on the other two functions. We used the Branin and Hartmann6D functions in Section 4.1.
Neural Information Processing Systems
Feb-6-2025, 07:02:01 GMT