Better Boosting with Bandits for Online Learning
Nikolaou, Nikolaos, Mellor, Joseph, Oza, Nikunj C., Brown, Gavin
The examples are considered to be of the form ( x i,y i), where x i is the feature vector of the i-th example and y i { 1, 1} is its class label. Extension to the multiclass case is often handled by breaking down the problem into multiple binary ones, so our analysis and its main results can carry over to the multiclass case. We consider the online setting where examples are presented to the learner in M minibatches 2 of size b. On the n -th iteration the learner performs the following steps: 1. Receive new examples x i, x i minibatch n 2. Predict the label ˆ y i and/or the probability estimate ˆ p(y i 1 x i), i minibatch n 3. Get true labels y i f ( x i), x i minibatch n, where f is the labelling function 4. Update learner parameters accordingly The steps above are intentionally left general enough to describe all learning components encountered in the paper. Our goal is to study the quality of the probability estimates generated by online boosting ensembles and strategies for improving it. Online boosting ensembles consist of multiple base learners, themselves also trained in an online fashion and -as we will seethe techniques used for improving the probability estimates (both the calibrator and the reward models of the bandits) are also learners trained in an online fashion. All follow the same general approach defined above: they maintain a model with a fixed number of parameters (i.e.
Jan-16-2020
- Country:
- Europe > United Kingdom (0.04)
- North America > United States
- New York > New York County > New York City (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government (0.46)
- Education > Educational Setting
- Online (0.41)