cluster-margin
Train Test # Classes CIFAR10 50,000 10,000 10 CIFAR100 50,000 10,000 100 SVHN 73,257 26,032 10 Table 3: CIFAR10, CIFAR100 and SVHN dataset statistics
The mean and standard error as computed across ten trials is shown. In this section, we expand on Section 3 by providing additional details and experimental results on the scalability of baseline methods and Cluster-Margin. Table 3 contains relevant statistics about the CIFAR10, CIFAR100 and SVHN datasets which have been omitted from the main body of the paper. A.1 Baseline Scalability As discussed in Section 3, we improve BADGE's scalability on certain datasets by partitioning the unlabeled pool into subsets, and running BADGE independently on each subset. Specifically, if the size of the unlabeled pool is n, and k is the batch size, we partition the pool uniformly at random into m sets, and run BADGE independently with a target batch size of k/m in each partition.
Batch Active Learning at Scale
Citovsky, Gui, DeSalvo, Giulia, Gentile, Claudio, Karydas, Lazaros, Rajagopalan, Anand, Rostamizadeh, Afshin, Kumar, Sanjiv
The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch -- a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to recent baselines. Finally, we provide an initial theoretical analysis, proving label complexity guarantees for a related sampling method, which we show is approximately equivalent to our sampling method in specific settings.