Near-Optimal Learning and Planning in Separated Latent MDPs

Chen, Fan, Daskalakis, Constantinos, Golowich, Noah, Rakhlin, Alexander

Jun-12-2024–arXiv.org Machine Learning

Reinforcement Learning (Kaelbling et al., 1996; Sutton and Barto, 2018) captures the common challenge of learning a good policy for an agent taking a sequence of actions in an unknown, dynamic environment, whose state transitions and reward emissions are influenced by the actions taken by the agent. Reinforcement learning has recently contributed to several headline results in Deep Learning, including Atari (Mnih et al., 2013), Go (Silver et al., 2016), and the development of Large Language Models (Christiano et al., 2017; Stiennon et al., 2020; Ouyang et al., 2022). This practical success has also sparked a burst of recent work on expanding its algorithmic, statistical and learning-theoretic foundations, towards bridging the gap between theoretical understanding and practical success. In general, the agent might not fully observe the state of the environment, instead having imperfect observations of its state. Such a setting is captured by the general framework of Partially Observable Markov Decision Processes (POMDPs) (Smallwood and Sondik, 1973).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

Jun-12-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:
- Research Report (0.81)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models > Undirected Networks
    - Markov Models (1.00)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found