Performance-bounded Online Ensemble Learning Method Based on Multi-armed bandits and Its Applications in Real-time Safety Assessment

Hu, Songqiao, Liu, Zeyi, He, Xiao

arXiv.org Artificial Intelligence 

--Ensemble learning plays a crucial role in practical applications of online learning due to its enhanced classification performance and adaptable adjustment mechanisms. However, most weight allocation strategies in ensemble learning are heuristic, making it challenging to theoretically guarantee that the ensemble classifier outperforms its base classifiers. T o address this issue, a performance-bounded online ensemble learning method based on multi-armed bandits, named PB-OEL, is proposed in this paper . Specifically, multi-armed bandit with expert advice is incorporated into online ensemble learning, aiming to update the weights of base classifiers and make predictions. A theoretical framework is established to bound the performance of the ensemble classifier relative to base classifiers. By setting expert advice of bandits, the bound exceeds the performance of any base classifier when the length of data stream is sufficiently large. Additionally, performance bounds for scenarios with limited annotations are also derived. Numerous experiments on benchmark datasets and a dataset of real-time safety assessment tasks are conducted. The experimental results validate the theoretical bound to a certain extent and demonstrate that the proposed method outperforms existing state-of-the-art methods. Index T erms --Online ensemble learning, performance bound, multi-armed bandits, concept drift, real-time safety assessment. NLINE learning (OL) holds significant potential for handling continuous data and is widely applied across various domains, including industry, recommendation systems, finance, and control systems [1]-[5]. The objective of OL is to continuously learn and update models from new data, enabling adaptation to non-stationary environments for optimized predictions or decisions. One mainstream idea in OL relies on maintaining a set of vectors for decision, as exemplified by the perceptron algorithm [6], passive-aggressive algorithm [7], confidence weighted-based algorithm [8] and imbalanced class weighted-based algorithm [9].