Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

May-28-2024–arXiv.org Machine Learning

Balancing the exploration-exploitation trade-off is a fundamental dilemma in reinforcement learning (RL). This issue has been systemically addressed in two main approaches, namely optimism in the face of uncertainty (OFU) and Thompson sampling (TS). The methods using OFU first construct confidence sets for the environment or model parameters given the samples observed so far. After finding the reward-maximizing or optimistic parameters within the confidence set, an optimal policy with respect to the parameters is constructed and executed [1]. Various algorithms using OFU are shown to have strong theoretical guarantees in bandits [2]. On the other hand, TS is a Bayesian method in which environment or model parameters are sampled from the posterior that is updated along the process using samples and a prior, and an optimal policy with respect to the sampled parameter is constructed and executed [3].

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

May-28-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report (0.40)

Industry:
- Energy > Oil & Gas > Upstream (0.34)

Technology:
- Information Technology
  - Mathematics of Computing (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Uncertainty
      - Bayesian Inference (0.47)
    - Machine Learning
      - Statistical Learning (0.68)
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found