Adjust Planning Strategies to Accommodate Reinforcement Learning Agents
–arXiv.org Artificial Intelligence
The solution of many continuous decision problem can be described as such a process: agent set out from the initial state, then go through a series of intermediate state and finally reach the goal state. Imagine an agent in a maze, which needs to find some key positions and pass through them one by one to get out. Agent has two types of behavior: one is the micro action taken at every state, which is similar to muscle activity, called reaction; another is the change of trend in reactions taken over a period of time, which is similar to thought of human, called planning [15]. For the agent in maze, reaction can be its every little moving step and planning can be its every determination of the position it should reach next. In a complicated scene with high-dimensional data stream, long-term decision process and sparse supervision signal, an agent trained only to react [9, 10] can hardly perform well (See Appendix A for demonstration). However, combining reaction and planning [3, 4, 14] can significantly improve its capability. The essence of such improvement is that agent has limited reaction capability and the introduction of planning releases agent from reacting in the whole task.
arXiv.org Artificial Intelligence
Mar-18-2020
- Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Genre:
- Research Report (0.50)
- Technology: