Collaborating Authors

active learning

Hand Labeling Considered Harmful


We are traveling through the era of Software 2.0, in which the key components of modern software are increasingly determined by the parameters of machine learning models, rather than hard-coded in the language of for loops and if-else statements. There are serious challenges with such software and models, including the data they're trained on, how they're developed, how they're deployed, and their impact on stakeholders. These challenges commonly result in both algorithmic bias and lack of model interpretability and explainability. There's another critical issue, which is in some ways upstream to the challenges of bias and explainability: while we seem to be living in the future with the creation of machine learning and deep learning models, we are still living in the Dark Ages with respect to the curation and labeling of our training data: the vast majority of labeling is still done by hand. Get a free trial today and find answers on the fly, or master something new and useful.

New Research Shows Learning Is More Effective When Active

CMU School of Computer Science

Engaging students through interactive activities, discussions, feedback and AI-enhanced technologies resulted in improved academic performance compared to traditional lectures, lessons or readings, faculty from Carnegie Mellon University's Human-Computer Interaction Institute concluded after collecting research into active learning. The research also found that effective active learning methods use not only hands-on and minds-on approaches, but also hearts-on, providing increased emotional and social support. Interest in active learning grew as the COVID-19 pandemic challenged educators to find new ways to engage students. Schools and teachers incorporated new technologies to adapt, while students faced negative psychological effects of isolation, restlessness and inattention brought on by quarantine and remote learning. The pandemic made it clear that traditional approaches to education may not be the best way to learn, but questions persisted about what active learning is and how best to use it to teach and engage and excite students.

Why Unsupervised Machine Learning is the Future of Cybersecurity


As we move towards a future where we lean on cybersecurity much more in our daily lives, it's important to be aware of the differences in the types of AI being used for network security. Over the last decade, Machine Learning has made huge progress in technology with Supervised and Reinforcement learning, in everything from photo recognition to self-driving cars. However, Supervised Learning is limited in its network security abilities like finding threats because it only looks for specifics that it has seen or labeled before, whereas Unsupervised Learning is constantly searching the network to find anomalies. Machine Learning comes in a few forms: Supervised, Reinforcement, Unsupervised and Semi-Supervised (also known as Active Learning). Supervised Learning relies on a process of labeling in order to "understand" information.

Bridging the Last Mile in Sim-to-Real Robot Perception via Bayesian Active Learning Artificial Intelligence

Learning from synthetic data is popular in avariety of robotic vision tasks such as object detection, becauselarge amount of data can be generated without annotationsby humans. However, when relying only on synthetic data,we encounter the well-known problem of the simulation-to-reality (Sim-to-Real) gap, which is hard to resolve completelyin practice. For such cases, real human-annotated data isnecessary to bridge this gap, and in our work we focus on howto acquire this data efficiently. Therefore, we propose a Sim-to-Real pipeline that relies on deep Bayesian active learningand aims to minimize the manual annotation efforts. We devisea learning paradigm that autonomously selects the data thatis considered useful for the human expert to annotate. Toachieve this, a Bayesian Neural Network (BNN) object detectorproviding reliable uncertain estimates is adapted to infer theinformativeness of the unlabeled data, in order to performactive learning. In our experiments on two object detectiondata sets, we show that the labeling effort required to bridge thereality gap can be reduced to a small amount. Furthermore, wedemonstrate the practical effectiveness of this idea in a graspingtask on an assistive robot.

A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification Machine Learning

Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of query difficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.

Active Learning for Argument Strength Estimation Artificial Intelligence

High-quality arguments are an essential part of decision-making. Automatically predicting the quality of an argument is a complex task that recently got much attention in argument mining. However, the annotation effort for this task is exceptionally high. Therefore, we test uncertainty-based active learning (AL) methods on two popular argument-strength data sets to estimate whether sample-efficient learning can be enabled. Our extensive empirical evaluation shows that uncertainty-based acquisition functions can not surpass the accuracy reached with the random acquisition on these data sets.

Improving Robustness and Efficiency in Active Learning with Contrastive Loss Artificial Intelligence

This paper introduces supervised contrastive active learning (SCAL) by leveraging the contrastive loss for active learning in a supervised setting. We propose efficient query strategies in active learning to select unbiased and informative data samples of diverse feature representations. We demonstrate our proposed method reduces sampling bias, achieves state-of-the-art accuracy and model calibration in an active learning setup with the query computation 11x faster than CoreSet and 26x faster than Bayesian active learning by disagreement. Our method yields well-calibrated models even with imbalanced datasets. We also evaluate robustness to dataset shift and out-of-distribution in active learning setup and demonstrate our proposed SCAL method outperforms high performing compute-intensive methods by a bigger margin (average 8.9% higher AUROC for out-of-distribution detection and average 7.2% lower ECE under dataset shift).

Why you should be using active learning to build ML / Humanloop blog


Data labelling is often the biggest bottleneck in machine learning -- finding, managing and labelling vast quantities of data to build a sufficiently performing model can take weeks or months. Active learning lets you train machine learning models with much less labelled data. We think you should too. In this post we'll explain what active learning is, discuss tools to use it in practice, and show what we're doing at Humanloop to make it easier for you to incorporate active learning in NLP. Imagine that you wanted to build a spam filter for your emails.

Active Learning by Acquiring Contrastive Examples Artificial Intelligence

Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting \textit{contrastive examples}, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.

Active Learning for Automated Visual Inspection of Manufactured Products Artificial Intelligence

Quality control is a key activity performed by manufacturing enterprises to ensure products meet quality standards and avoid potential damage to the brand's reputation. The decreased cost of sensors and connectivity enabled an increasing digitalization of manufacturing. In addition, artificial intelligence enables higher degrees of automation, reducing overall costs and time required for defect inspection. In this research, we compare three active learning approaches and five machine learning algorithms applied to visual defect inspection with real-world data provided by Philips Consumer Lifestyle BV. Our results show that active learning reduces the data labeling effort without detriment to the models' performance.