Goto

Collaborating Authors

 validation accuracy


Appendix

Neural Information Processing Systems

We experiment with 8 implementations of MoCaD, i.e. two different calibrators combined with four different ensembling strategies as the same as in previous experiments. For Learned-Mixin, the entropy term weight is set to the value suggested by [1]. We run each experiment five times and report the mean scores and the standard deviations. For the Dirichlet calibrator, we use the same configurationasinFEVER. Experimental Results Table 2 shows the experimental result on image classification.




GeneralizedJensen-ShannonDivergenceLoss forLearningwithNoisyLabels

Neural Information Processing Systems

Based on this observation, we adopt ageneralized version ofthe JensenShannon divergence for multiple distributions to encourage consistency around data points. Using this loss function, we show state-of-the-art results on both synthetic(CIFAR),andreal-world(e.g.WebVision)noisewithvaryingnoiserates.



Learning to Propagate for Graph Meta-Learning

LU LIU, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

Neural Information Processing Systems

Inmost meta-learning methods, tasks areimplicitly related bysharing parameters oroptimizer. We develop a novel meta-learner of this type for prototype based classification, in which a prototype is generated for each class, such that the nearest neighbor search among the prototypes produces an accurate classification.




SupplementaryMaterial

Neural Information Processing Systems

Recall the definition: a set function f(S) is submodular, if for any subsets S S0 Z, and i Z S0, f(S {i}) f(S) f(S0 {i}) f(S0). For experiments in section 5.2, all checkpoints are instances of resnet-50. They are trained by a batch size of 128, and an initial learning rate of 0.1. We run for 200 epochs, with learning rate decay at the 60th, 120th and 160th epoch. A typical validation accuracy from these checkpoint (on its own task) is about 83% (reasonably good).


ARelationshipwithEvolutionStrategies(ES) Inthemainpaper,werestrictthegradienttotherandombase

Neural Information Processing Systems

Formally,this constraint also applies to special cases of Natural Evolution Strategies [37, 3]. Similar estimators can be obtained for other symmetric distributions with finite second moment. Moreover,theadditionalhyperparameter σ that determines the magnitude of the perturbation needs to be carefully chosen [33]. Figure B.7: Validation accuracy after 100 epochs and mean gradient correlation with SGD plotted against increasing subspace dimensionality d on the CIFAR-10 CNN (average of three runs). As expected, the mean cosine similarity across 100 pairs of random vectors decreases with growing dimensionality.