The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning
Soltoggio, Andrea, Ben-Iwhiwhu, Eseoghene, Peridis, Christos, Ladosz, Pawel, Dick, Jeffery, Pilly, Praveen K., Kolouri, Soheil
–arXiv.org Artificial Intelligence
Many real-world problems are characterized by a large number of observations, confounding and spurious correlations, partially observable states, and distal, dynamic rewards with hierarchical reward structures. Such conditions make it hard for both animal and machines to learn complex skills. The learning process requires discovering what is important and what can be ignored, how the reward function is structured, and how to reuse knowledge across different tasks that share common properties. For these reasons, the application of standard reinforcement learning (RL) algorithms (Sutton and Barto, 2018) to solve structured problems is often not effective. Limitations of current RL algorithms include the problem of exploration with sparse rewards (Pathak et al., 2017), dealing with partially observable Markov decision problems (POMDP) (Ladosz et al., 2021), coping with large amounts of confounding stimuli (Thrun, 2000; Kim et al., 2019), and reusing skills for efficiently learning multiple task in a lifelong learning setting (Mendez and Eaton, 2020). Standard reinforcement learning algorithms are best suited when the problem can be formulated as a single-task problem in observable Markov decision problem (MDP). Under these assumptions, with complete observability and with static and frequent rewards, deep reinforcement learning (DRL) (Mnih et al., 2015; Li, 2017) has gained popularity due to the ability to learn an approximated Q-value function directly from raw pixel data in the Atari 2600 platform. This and similar algorithms stack multiple frames to derive states of an MDP, and use a basic ɛ-greedy exploration policy. In more complex cases with partial observability and sparse rewards, extensions have been proposed to include more advanced exploration techniques (Ladosz et al., 2022), e.g.
arXiv.org Artificial Intelligence
Jan-21-2023
- Country:
- North America > United States (0.67)
- Genre:
- Research Report (0.50)
- Industry:
- Education (0.69)
- Government
- Military (0.46)
- Regional Government > North America Government (0.46)
- Leisure & Entertainment (0.46)