Exponential Family Model-Based Reinforcement Learning via Score Matching

Li, Gene, Li, Junbo, Srebro, Nathan, Wang, Zhaoran, Yang, Zhuoran

Dec-28-2021–arXiv.org Machine Learning

This paper studies the regret minimization problem for finite horizon, episodic reinforcement learning (RL) with infinitely large state and action spaces. Empirically, RL has achieved success in diverse domains, even when the problem size (measured in the number of states and actions) explodes [35, 44, 28]. The key to developing sample-efficient algorithms is to leverage function approximation, enabling us to generalize across different state-action pairs. Much theoretical progress has been made towards understanding function approximation in RL. Existing theory typically requires strong linearity assumptions on transition dynamics [e.g., 55, 26, 10, 36] or action-value functions [e.g., 30, 57] of the Markov Decision Process (MDP). However, most real world problems are nonlinear, and our theoretical understanding of these settings remains limited. Thus, we ask the question: Can we design provably efficient RL algorithms in nonlinear environments? Recently, Chowdhury et al. [13] introduced a nonlinear setting where the state-transition measures are finitely parameterized exponential family models, and they proposed to estimate model parameters via maximum likelihood estimation (MLE). The exponential family is a well-studied and powerful statistical framework, so it is a natural model class to consider beyond linear models.

algorithm, estimator, vec, (15 more...)

arXiv.org Machine Learning

Dec-28-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.54)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models
      - Directed Networks > Bayesian Learning (0.54)
      - Undirected Networks > Markov Models (0.48)