Several recent papers investigate Active Learning (AL) for mitigating the data dependence of deep learning for natural language processing. However, the applicability of AL to real-world problems remains an open question. While in supervised learning, practitioners can try many different methods, evaluating each against a validation set before selecting a model, AL affords no such luxury. Over the course of one AL run, an agent annotates its dataset exhausting its labeling budget. Thus, given a new task, an active learner has no opportunity to compare models and acquisition functions. This paper provides a large scale empirical study of deep active learning, addressing multiple tasks and, for each, multiple datasets, multiple models, and a full suite of acquisition functions. We find that across all settings, Bayesian active learning by disagreement, using uncertainty estimates provided either by Dropout or Bayes-by Backprop significantly improves over i.i.d. baselines and usually outperforms classic uncertainty sampling.
We propose a new active learning strategy designed for deep neural networks. The goal is to minimize the number of data annotation queried from an oracle during training. Previous active learning strategies scalable for deep networks were mostly based on uncertain sample selection. In this work, we focus on examples lying close to the decision boundary. Based on theoretical works on margin theory for active learning, we know that such examples may help to considerably decrease the number of annotations. While measuring the exact distance to the decision boundaries is intractable, we propose to rely on adversarial examples. We do not consider anymore them as a threat instead we exploit the information they provide on the distribution of the input space in order to approximate the distance to decision boundaries. We demonstrate empirically that adversarial active queries yield faster convergence of CNNs trained on MNIST, the Shoe-Bag and the Quick-Draw datasets.
We study the problem of training machine learning models incrementally using active learning with access to imperfect or noisy oracles. We specifically consider the setting of batch active learning, in which multiple samples are selected as opposed to a single sample as in classical settings so as to reduce the training overhead. Our approach bridges between uniform randomness and score based importance sampling of clusters when selecting a batch of new samples. Experiments on benchmark image classification datasets (MNIST, SVHN, and CIFAR10) shows improvement over existing active learning strategies. We introduce an extra denoising layer to deep networks to make active learning robust to label noises and show significant improvements.
Supervised machine learning methods usually require a large set of labeled examples for model training. However, in many real applications, there are plentiful unlabeled data but limited labeled data; and the acquisition of labels is costly. Active learning (AL) reduces the labeling cost by iteratively selecting the most valuable data to query their labels from the annotator. This article introduces a Python toobox ALiPy for active learning. ALiPy provides a module based implementation of active learning framework, which allows users to conveniently evaluate, compare and analyze the performance of active learning methods. In the toolbox, multiple options are available for each component of the learning framework, including data process, active selection, label query, results visualization, etc. In addition to the implementations of more than 20 state-of-the-art active learning algorithms, ALiPy also supports users to easily configure and implement their own approaches under different active learning settings, such as AL for multi-label data, AL with noisy annotators, AL with different costs and so on. The toolbox is well-documented and open-source on Github, and can be easily installed through PyPI.
In this paper, we proposed a unified and principled method for both querying and training process in deep batch active learning. We provided the theoretical insights from the intuition of modeling the interactive procedure in active learning as distribution matching, by adopting Wasserstein distance. As a consequence, we derived a new training loss from the theoretical analysis, which is decomposed into optimizing deep neural network parameter and batch query selection through alternative optimization. In addition, the loss for training deep neural network is naturally formulated as a min-max optimization problem through leveraging the unlabeled data information. Moreover, the proposed principles also indicate an explicit uncertainty-diversity trade-off in the query batch selection. Finally we evaluated our proposed method for different benchmarks, showing consistently better empirical performances and more time efficient query strategy, comparing to several baselines.