Markov Models
Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes
We study sequential decision-making problems in which each agent aims to maximize the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon Constrained Markov Decision Processes (CMDPs) problem. Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method for CMDPs which updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent.
Finite-TimeAnalysisofRound-Robin Kullback-LeiblerUpperConfidenceBoundsfor OptimalAdaptiveAllocationwithMultiplePlaysand MarkovianRewards
Forouranalysis wedevise several concentration results forMarkovchains, including amaximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish asymptotically optimal, finite-time guarantees for the case of multiple plays, and i.i.d.
Temporally Disentangled Representation Learning under Unknown Nonstationarity Xiangchen Song
However, in nonstationary setting, existing work only partially addressed the problem by either utilizing observed auxiliary variables (e.g., class labels and/or domain indexes) as side-information or assuming simplified latent causal dynamics. Both constrain the method to a limited range of scenarios.