Pool-Based Active Learning with Proper Topological Regions
Hadjadj, Lies, Devijver, Emilie, Molinier, Remi, Amini, Massih-Reza
–arXiv.org Artificial Intelligence
In recent years, machine learning has found gainful applications in diverse domains, but it still has a heavy dependence on expensive labeled data: Advances in cheap computing and storage have made it easier to store and process large amounts of unlabeled data, but the labeling often needs to be done by humans or using costly tools. Therefore, there is a need to develop general domain-independent methods to learn models effectively from a large amount of unlabeled data at the disposal, along with a minimal amount of labeled data: this is the framework of semi-supervised learning. Active learning aims explicitly to detect the observations to be labeled to optimize the learning process and efficiently reduce the labeling cost. The primary assumption behind active learning is that machine learning algorithms could reach a higher level of performance while using a smaller number of training labels if they were allowed to choose the training dataset (Settles, 2009). The most common active learning approaches are pool-based methods (Lewis and Catlett, 1994) based on a set of unlabeled observations. First, some points are labeled to train a classification model, and then, at each iteration, we choose unlabeled examples to query based on the predictions of the current model and a predefined priority score. These approaches show their limitations in low-budget regime scenarios because they need a sufficient budget to learn a weak model (Pourahmadi et al., 2021). 1
arXiv.org Artificial Intelligence
Oct-2-2023
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- Massachusetts (0.28)
- New Jersey > Mercer County
- Princeton (0.14)
- California > San Francisco County
- Europe > United Kingdom
- Genre:
- Research Report (1.00)