Continuous Control with Contexts, Provably

Du, Simon S., Wang, Ruosong, Wang, Mengdi, Yang, Lin F.

Oct-29-2019–arXiv.org Machine Learning

A fundamental challenge in artificial intelligence is to build an agent that generalizes and adapts to unseen environments. A common strategy is to build a decoder that takes the context of the unseen new environment as input and generates a policy accordingly. The current paper studies how to build a decoder for the fundamental continuous control task, linear quadratic regulator (LQR), which can model a wide range of real-world physical environments. We present a simple algorithm for this problem, which uses upper confidence bound (UCB) to refine the estimate of the decoder and balance the exploration-exploitation trade-off. Theoretically, our algorithm enjoys a $\widetilde{O}\left(\sqrt{T}\right)$ regret bound in the online setting where $T$ is the number of environments the agent played. This also implies after playing $\widetilde{O}\left(1/\epsilon^2\right)$ environments, the agent is able to transfer the learned knowledge to obtain an $\epsilon$-suboptimal policy for an unseen environment. To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting. While our main focus is theoretical, we also present experiments that demonstrate the effectiveness of our algorithm.

health & medicine, null, upstream oil & gas, (18 more...)

arXiv.org Machine Learning

Oct-29-2019

arXiv.org PDF

Add feedback

Country:
- Europe > Sweden (0.14)
- North America > United States
  - California > Los Angeles County > Los Angeles (0.14)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine > Therapeutic Area
  - Immunology (0.46)
- Energy > Oil & Gas
  - Upstream (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found