Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information

Zhang, Ziyi, Nakahira, Yorie, Qu, Guannan

arXiv.org Artificial Intelligence 

Policy design of non-stationary Markov Decision Processes (MDPs) has always been challenging due to the time-varying system dynamics and rewards, so the learner usually suffers from uncertainties of future rewards and transitions. Fortunately, exogenous predictions are available in many applications. For example, in energy systems, look-ahead information is available in the form of renewable generation forecasts and demand forecasts Amin et al. [2019]. It is intuitive to design an algorithm that controls the energy system by utilizing that information to concentrate energy usage in the time frame with the lowest energy price and lower the overall energy cost. To give another example, smart servers can make predictions of future internet traffic from historical data Katris and Daskalaki [2015]. Given that the server tries to minimize the average waiting time of all tasks, if there is only light traffic, the average waiting time will be most reduced by only using the fastest server. However, if the smart server forecasts that there will be heavy traffic in the future, all servers should work to reduce the length of the queue. However, although policy adaptation in a time-varying environment has been extensively studied [Auer et al., 2008; Richards et al., 2021; Zhang et al., 2024; Gajane et al., 2018], they do not typically take advantage of exogenous predictions.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found