Planning in entropy-regularized Markov decision processes and games
Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko
–Neural Information Processing Systems
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment.
Neural Information Processing Systems
Jan-23-2025, 15:44:33 GMT