Table 4 Selected learning rates for all methods . Method Learning rate
–Neural Information Processing Systems
Datasets We run all experiments on the standard GLUE benchmark [18] with Creative Commons license (CCBY 4.0) and the SUPERGLUE benchmark [19]. Low-resource fine-tuning For the experiment conducted in 5.6, we set the number of epochs to 1000, 200, 100, 50, 25, for datasets subsampled to size 100, 500, 1000, 2000, and 4000 respectively. Based on our results, this is sufficient to allow the models to converge. We save a checkpoint every 250 steps for all models and report the results for the hyper-parameters performing the best on the validation set for each task. Data pre-processing: Following Raffel et al. [3], we cast all datasets into a sequence-to-sequence format.
Neural Information Processing Systems
Apr-24-2026, 13:31:20 GMT