Towards Intrinsic Interactive Reinforcement Learning

Poole, Benjamin, Lee, Minwoo

arXiv.org Artificial Intelligence 

Meanwhile, applications of RL have only begun to expand beyond these constrained game environments to more diverse and complex real-world environments such as chip design [86], chemical reaction optimization [133] and performing long-term recommendations [45]. To further progress towards these more complex real-world environments, greater alleviation of challenges currently facing RL (e.g., generalization, robustness, scalability, and safety) is needed [7, 27, 72, 108]. Moreover, we can expect that as the complexity of environments increases, the difficulty in alleviating these challenges will increase as well [27]. For the purpose of this paper, we broadly define known RL challenges as either an aptitude or alignment problem. Aptitude encompasses challenges concerned with being able to learn. Aptitude includes ideas such as robustness, the ability of RL to perform a task (e.g., asymptotic performance) and generalize within/between environments of similar complexity; scalability, the ability of RL to scale up to more complex environment; and aptness, the rate at which a RL algorithm can learn to solve a problem or achieve a desired performance level. Likewise, alignment encompasses challenges concerned with learning as intended [7, 27, 72]. The hypothetical paperclip agent [18] is a classic example of misalignment.