Symmetry Learning for Function Approximation in Reinforcement Learning
Mahajan, Anuj, Tulabandhula, Theja
Reinforcement Learning (RL) is the task of training an agent to perform optimally in an environment using the reward and observation signals perceived upon taking actions which change the environment dynamics. Learning optimal behavior is inherently difficult because of challenges like credit assignment and exploration-exploitation trade offs that need to be made while converging to a solution. In many scenarios, like training a rover to move on a Martian surface, the cost of obtaining samples for learning can be high (in terms of robot's energy expenditure etc.), and so sample efficiency is an important subproblem which deserves special attention. Very often it is the case that the environment has intrinsic symmetries which can be leveraged by the agent to improve performance and learn more efficiently. For example, in the Cart-Pole domain [1, 2] the state action space is symmetric with respect to reflection about the plane perpendicular to the direction of motion of the cart (Figure 1). In fact, in many environments, the number of symmetry relations tend to increase with the dimensionality of the state space. For instance, for the simple case of grid world of dimension d (Figure 1) there exist O(d!2
Jun-9-2017
- Country:
- North America > United States (1.00)
- Genre:
- Research Report > Promising Solution (0.34)
- Technology: