Supplementary Material Proof of Proposition
–Neural Information Processing Systems
Referring to Eq. (3), we realize that the left side equals H ({i }|S) H ( {i}| S For experiments in section 5.1, we use a batch size of 32 sentences, adam optimizer with a learning rate of 1e-3. We run for 40 epochs and report the test metric at the "best" validation epoch. For experiments in section 5.2, all checkpoints are instances of resnet-50. They are trained by a batch size of 128, and an initial learning rate of 0.1. We run for 200 epochs, with learning rate decay at the 60th, 120th and 160th epoch.
Neural Information Processing Systems
Aug-16-2025, 12:40:22 GMT