Goto

Collaborating Authors

 policy evaluation








On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

Neural Information Processing Systems

Risk-averse reinforcement learning (RL) seeks to provide a risk-averse policy for high-stakes real-world decision problems. These high-stake domains include autonomous driving (Jin et al., 2019; Sharma et al., 2020), robot collision avoidance (Ahmadi et al., 2021; Hakobyan and Y ang, 2021),


Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding Chenhao Zhou

Neural Information Processing Systems

In this paper, we propose a provably efficient natural policy gradient algorithm called Spectral Dynamic Embedding Policy Optimization ( SDEPO) for two-player zero-sum stochastic Markov games with continuous state space and finite action space. In the policy evaluation procedure of our algorithm, a novel kernel embedding method is employed to construct a finite-dimensional linear approximations to the state-action value function.