A Experimental Setup

Neural Information Processing Systems 

A.2 Training Settings of T eacher We provide training settings of the teacher w.r.t. In practice, we do not optimize the student and the generator via the plain losses in Eq. 4 and Eq. 6, Number of steps for pretraining G, δ: the bound in Eqs. A.4 Generator Architectures In Table 8, we show different architectures of the generator w.r.t. ResNetBlockY are provided in Table 9. ConvBlockX(c This is because the "uncond" generator has learned to jump "sum" generator enables stable training of our model and gives the best accuracy and crossentropy The "cat" generator only yields good results at "uncond" generator does not encounter any problem with MAD to learn faster than the "cat" generator. An important question is "What is a reasonable upper bound