Goto

Collaborating Authors

 simpler domain



Supplementary Materials A Algorithm details

Neural Information Processing Systems

Our innovation of optimizing interval times is highlighted in blue in Algorithm 1 . A key assumption of Algorithm 1 is that acting often using short time intervals will not hurt performance, and that maximal interaction (i.e. In many scenarios, this assumption seems reasonable and applying Algorithm 1 may work well. For example, some Atari games require frameskipping, i.e., repeating actions for Algorithm 2 assumes that the dynamics can be fully covered by random policies. However, these may be far away from the optimal policy.


Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

Du, Jianzhun, Futoma, Joseph, Doshi-Velez, Finale

arXiv.org Machine Learning

We present two elegant solutions for modeling continuous-time dynamics, in a novel model-based reinforcement learning (RL) framework for semi-Markov decision processes (SMDPs), using neural ordinary differential equations (ODEs). Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data. We also develop a model-based approach for optimizing time schedules to reduce interaction rates with the environment while maintaining the near-optimal performance, which is not possible for model-free methods. We experimentally demonstrate the efficacy of our methods across various continuous-time domains.