Planning with a Learned Policy Basis to Optimally Solve Complex Tasks
Infante, Guillermo, Kuric, David, Jonsson, Anders, Gómez, Vicenç, van Hoof, Herke
–arXiv.org Artificial Intelligence
Autonomous agents that interact with an environment usually To alleviate this issue, one can consider methods that condition face tasks that comprise complex, entangled behaviors over the policy or the value function on the specification of long horizons. Conventional reinforcement learning (RL) the whole task (Schaul et al. 2015) and such approaches were methods have successfully addressed this. However, in cases recently also proposed for tasks with non-Markovian reward when the agent is meant to perform several tasks across similar functions (Vaezipoor et al. 2021). However, the methods that environments, training a policy for every task separately specify the whole task usually rely on a blackbox neural network can be time-consuming and requires a lot of data. In such for planning when determining which sub-goal to reach cases, the agent can utilize a method that has built-in generalization next. This makes it hard to interpret the plan to solve the task capabilities. One such method relies on the assumption and although they show promising results in practice, it is that reward functions of these tasks can be decomposed unclear whether and when these approaches will generalize into a linear combination of successor features (Barreto et al. to a new task.
arXiv.org Artificial Intelligence
Jun-3-2024
- Country:
- Europe
- France (0.14)
- Netherlands (0.14)
- Spain (0.14)
- North America > United States (0.14)
- Europe
- Genre:
- Research Report (0.82)
- Technology: