Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for the Model-free LQR
Toso, Leonardo F., Zhan, Donglin, Anderson, James, Wang, Han
–arXiv.org Artificial Intelligence
One of the main successes of Reinforcement Learning (RL) (for example, in the context of robotics) is its ability to learn control policies that rapidly adapt to different agents and environments (Wang et al., 2016; Duan et al., 2016; Rothfuss et al., 2018). This idea of learning a control policy that efficiently adapts to unseen RL tasks is referred to as meta-learning, or learning to learn. The most popular approach is the Model-Agnostic Meta-Learning (MAML) (Finn et al., 2017, 2019). In the context of RL, the role of MAML is to exploit task diversity of RL tasks drawn from a common task distribution to learn a control policy in a multi-task and heterogeneous setting that is only a few policy gradient (PG) steps away from an unseen task optimal policy. Despite its success in image classification and RL, more needs to be understood about the theoretical convergence guarantees of MAML for both model-based and model-free learning.
arXiv.org Artificial Intelligence
Jan-25-2024
- Country:
- North America > United States
- California (0.04)
- New York (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Technology: