Goto

Collaborating Authors

 Reinforcement Learning


Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

Neural Information Processing Systems

We study sequential decision-making problems in which each agent aims to maximize the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon Constrained Markov Decision Processes (CMDPs) problem. Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method for CMDPs which updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent.


GRASP: NavigatingRetrosyntheticPlanningwith Goal-drivenPolicy

Neural Information Processing Systems

Retrosynthetic planning occupies a crucial position in synthetic chemistry and, accordingly, drug discovery, which aims to find synthetic pathways of a target molecule through a sequential decision-making process on a set of feasible reactions.






Few

Neural Information Processing Systems

Deep reinforcement learning (RL) algorithms have demonstrated promising results on a variety of complex tasks, such as robotic manipulation [22, 13] and strategy games [27, 38].