Stochastic Reinforcement Learning for Continuous Actions in Dynamic Environments
Shah, Syed Naveed Hussain (Microsoft Corporation ) | Hougen, Dean Frederick (University of Oklahoma)
Reinforcement learning (RL) agents use trial and error to learn action policies for environment states. Environments with continuous action spaces are far more challenging for RL than those with discrete actions because there are infinite possible continuous action values from which to choose. Dynamic environments create additional challenges for RL agents, which must adjust rapidly to changes. We recently introduced REINFORCE SUN, a superclass of REINFORCE with Gaussian units, that allows for stochasticity at different levels of granularity in artificial neural networks (synapse, unit, or network), and have shown that moving stochasticity to synapses greatly aids RL in both static and dynamic environments with continuous action spaces. However, we also found that performance in dynamic environments remained substantially lower than desired. To rectify this, we here consider alternative parameter update equations for learning in dynamic environments. These equations form the core of Stochastic Synapse Reinforcement Learning (SSRL), which we here generalize to create S*RL, a superclass of SSRL that allows for stochasticity at these levels. Empirical results using multi-dimensional robot inverse kinematic data sets show that S*RL update equations greatly outperform traditional REINFORCE equations in dynamic, continuous state and action spaces.
May-16-2020
- Technology: