Supplementary Material

Neural Information Processing Systems 

The train, text and validation splits for SST2 [47] and SST5 [47] are used from the source itself while the validation data for TREC6 [35, 18] is obtained using 10% of the train data. The test data for glue-SST2 [51] is obtained using 5% of the train data. Seed value of 42 is used in generator argument in random_split function of torch. In Table 1, we summarize the number classes, and number of instances in each split in the text datasets.