He, Yongming
Ensemble Reinforcement Learning: A Survey
Song, Yanjie, Suganthan, P. N., Pedrycz, Witold, Ou, Junwei, He, Yongming, Chen, Yingwu, Wu, Yutong
Reinforcement Learning (RL) has emerged as a highly effective technique for addressing various scientific and applied problems. Despite its success, certain complex tasks remain challenging to be addressed solely with a single model and algorithm. In response, ensemble reinforcement learning (ERL), a promising approach that combines the benefits of both RL and ensemble learning (EL), has gained widespread popularity. ERL leverages multiple models or training algorithms to comprehensively explore the problem space and possesses strong generalization capabilities. In this study, we present a comprehensive survey on ERL to provide readers with an overview of recent advances and challenges in the field. Firstly, we provide an introduction to the background and motivation for ERL. Secondly, we conduct a detailed analysis of strategies such as model selection and combination that have been successfully implemented in ERL. Subsequently, we explore the application of ERL, summarize the datasets, and analyze the algorithms employed. Finally, we outline several open questions and discuss future research directions of ERL. By offering guidance for future scientific research and engineering applications, this survey significantly contributes to the advancement of ERL.
A Two-stage Framework and Reinforcement Learning-based Optimization Algorithms for Complex Scheduling Problems
He, Yongming, Wu, Guohua, Chen, Yingwu, Pedrycz, Witold
There hardly exists a general solver that is efficient for scheduling problems due to their diversity and complexity. In this study, we develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together to efficiently deal with complex scheduling problems. The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively. This offers a novel and general paradigm that combines RL with OR approaches to solving scheduling problems, which leverages the respective strengths of RL and OR: The MDP narrows down the search space of the original problem through an RL method, while the mixed-integer programming process is settled by an OR algorithm. These two stages are performed iteratively and interactively until the termination criterion has been met. Under this idea, two implementation versions of the combination methods of RL and OR are put forward. The agile Earth observation satellite scheduling problem is selected as an example to demonstrate the effectiveness of the proposed scheduling framework and methods. The convergence and generalization capability of the methods are verified by the performance of training scenarios, while the efficiency and accuracy are tested in 50 untrained scenarios. The results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems. In addition, it can be found that RL-based optimization algorithms have stronger scalability than non-learning algorithms. This work reveals the advantage of combining reinforcement learning methods with heuristic methods or mathematical programming methods for solving complex combinatorial optimization problems.