validation accuracy
Appendix
We experiment with 8 implementations of MoCaD, i.e. two different calibrators combined with four different ensembling strategies as the same as in previous experiments. For Learned-Mixin, the entropy term weight is set to the value suggested by [1]. We run each experiment five times and report the mean scores and the standard deviations. For the Dirichlet calibrator, we use the same configurationasinFEVER. Experimental Results Table 2 shows the experimental result on image classification.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
- North America > United States > Maryland (0.05)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Law (0.67)
- Government (0.67)
Learning to Propagate for Graph Meta-Learning
LU LIU, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
Inmost meta-learning methods, tasks areimplicitly related bysharing parameters oroptimizer. We develop a novel meta-learner of this type for prototype based classification, in which a prototype is generated for each class, such that the nearest neighbor search among the prototypes produces an accurate classification.
- Oceania > Australia (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States (0.28)
- Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
- Asia > India (0.04)
- Health & Medicine (0.46)
- Government (0.46)
SupplementaryMaterial
Recall the definition: a set function f(S) is submodular, if for any subsets S S0 Z, and i Z S0, f(S {i}) f(S) f(S0 {i}) f(S0). For experiments in section 5.2, all checkpoints are instances of resnet-50. They are trained by a batch size of 128, and an initial learning rate of 0.1. We run for 200 epochs, with learning rate decay at the 60th, 120th and 160th epoch. A typical validation accuracy from these checkpoint (on its own task) is about 83% (reasonably good).
ARelationshipwithEvolutionStrategies(ES) Inthemainpaper,werestrictthegradienttotherandombase
Formally,this constraint also applies to special cases of Natural Evolution Strategies [37, 3]. Similar estimators can be obtained for other symmetric distributions with finite second moment. Moreover,theadditionalhyperparameter σ that determines the magnitude of the perturbation needs to be carefully chosen [33]. Figure B.7: Validation accuracy after 100 epochs and mean gradient correlation with SGD plotted against increasing subspace dimensionality d on the CIFAR-10 CNN (average of three runs). As expected, the mean cosine similarity across 100 pairs of random vectors decreases with growing dimensionality.