Learning Graphical Models
Bayesian Control of Large MDPs with Unknown Dynamics in Data-Poor Environments
Mahdi Imani, Seyede Fatemeh Ghoreishi, Ulisses M. Braga-Neto
We propose a Bayesian decision making framework for control of Markov Decision Processes (MDPs) with unknown dynamics and large, possibly continuous, state, action, and parameter spaces in data-poor environments. Most of the existing adaptive controllers for MDPs with unknown dynamics are based on the reinforcement learning framework and rely on large data sets acquired by sustained direct interaction with the system or via a simulator. This is not feasible in many applications, due to ethical, economic, and physical constraints. The proposed framework addresses the data poverty issue by decomposing the problem into an offline planning stage that does not rely on sustained direct interaction with the system or simulator and an online execution stage. In the offline process, parallel Gaussian process temporal difference (GPTD) learning techniques are employed for near-optimal Bayesian approximation of the expected discounted reward over a sample drawn from the prior distribution of unknown parameters. In the online stage, the action with the maximum expected return with respect to the posterior distribution of the parameters is selected. This is achieved by an approximation of the posterior distribution using a Markov Chain Monte Carlo (MCMC) algorithm, followed by constructing multiple Gaussian processes over the parameter space for efficient prediction of the means of the expected return at the MCMC sample. The effectiveness of the proposed framework is demonstrated using a simple dynamical system model with continuous state and action spaces, as well as a more complex model for a metastatic melanoma gene regulatory network observed through noisy synthetic gene expression data.
Nonparametric Bayesian Lomax delegate racing for survival analysis with competing risks
Apart from modeling the time to event, in the presence of competing risks, it is also important to model the event type, or under which risk the event is likely to occur first. Though one can censor subjects with an occurrence of the event under a competing risk other than the risk of special interest, so that every survival model that can handle censoring is able to model competing risks, it is problematic to violate the principle of non-informative censoring [18, 19].