Feedback Coding for Active Learning

Canal, Gregory, Bloch, Matthieu, Rozell, Christopher

arXiv.org Machine Learning 

Active learning is an area of modern machine learning that studies how data points can be sequentially selected for labeling to train a model with as few labeled examples as possible (Settles, 2009). Minimizing the number of labeled examples is critical in any learning scenario where labels are expensive to obtain, such as in healthcare applications where a medical expert must hand-label each training example (Liu, 2004), or where only a limited number of examples can be evaluated, such as in drug discovery (Warmuth et al., 2003). The active selection of data points shares many technical parallels with channel coding with feedback, where a message is encoded into a sequence of symbols transmitted across a noisy channel and each symbol is selected based on the message and past channel outputs. In active learning, the optimal classifier parameters play the role of the "message" while the sequence of examples with noisy labels plays the role of "channel outputs" available through feedback to select the next example for labeling. Both feedback channel coding and active learning seek to minimize the number of encoder actions, leverage a history of noisy observations to select the next most informative action, must account for observation noise, and should operate in a computationally efficient manner. Although there exists a large literature studying the intersection of information theory with machine learning (Xu and Raginsky, 2017) and specifically active learning (Naghshvar et al., 2015), there remain open questions about the best ways to directly leverage techniques in channel coding for active example selection. The main contribution of this work is a formulation of general active learning problems in terms of a feedback coding system, and a demonstration of this approach through the application and analysis of active learning in logistic regression. To motivate this approach, we first examine active learning through the lens of feedback channel coding by identifying communications system components, including a deterministic encoder, noisy channel, channel input constraints, and capacity-achieving distribution. With these components identified, we show how typical structural constraints in active learning problems prevent the direct application of existing feedback coding approaches such as posterior matching (Ma and Coleman, 2011).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found