Exponential Family Model-Based Reinforcement Learning via Score Matching
Li, Gene, Li, Junbo, Srebro, Nathan, Wang, Zhaoran, Yang, Zhuoran
This paper studies the regret minimization problem for finite horizon, episodic reinforcement learning (RL) with infinitely large state and action spaces. Empirically, RL has achieved success in diverse domains, even when the problem size (measured in the number of states and actions) explodes [35, 44, 28]. The key to developing sample-efficient algorithms is to leverage function approximation, enabling us to generalize across different state-action pairs. Much theoretical progress has been made towards understanding function approximation in RL. Existing theory typically requires strong linearity assumptions on transition dynamics [e.g., 55, 26, 10, 36] or action-value functions [e.g., 30, 57] of the Markov Decision Process (MDP). However, most real world problems are nonlinear, and our theoretical understanding of these settings remains limited. Thus, we ask the question: Can we design provably efficient RL algorithms in nonlinear environments? Recently, Chowdhury et al. [13] introduced a nonlinear setting where the state-transition measures are finitely parameterized exponential family models, and they proposed to estimate model parameters via maximum likelihood estimation (MLE). The exponential family is a well-studied and powerful statistical framework, so it is a natural model class to consider beyond linear models.
Dec-28-2021
- Country:
- North America > United States
- Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (1.00)