Oceania
Supplementary Material Proof of Proposition
Referring to Eq. (3), we realize that the left side equals H ({i }|S) H ( {i}| S For experiments in section 5.1, we use a batch size of 32 sentences, adam optimizer with a learning rate of 1e-3. We run for 40 epochs and report the test metric at the "best" validation epoch. For experiments in section 5.2, all checkpoints are instances of resnet-50. They are trained by a batch size of 128, and an initial learning rate of 0.1. We run for 200 epochs, with learning rate decay at the 60th, 120th and 160th epoch.