VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation
Nguyen-Tang, Thanh, Arora, Raman
–arXiv.org Artificial Intelligence
In this section, we empirically evaluate the proposed algorithm VIPeR against several state-of-the-art baselines, including (a) PEVI (Jin et al., 2021), which explicitly constructs lower confidence bound (LCB) for pessimism in a linear model (thus, we rename this algorithm as LinLCB for convenience in our experiments); (b) NeuraLCB (Nguyen-Tang et al., 2022a) which explicitly constructs an LCB using neural network gradients; (c) NeuraLCB (Diag), which is NeuraLCB with a diagonal approximation for estimating the confidence set as suggested in NeuraLCB (Nguyen-Tang et al., 2022a); (d) Lin-VIPeR which is VIPeR realized to the linear function approximation instead of neural network function approximation; (e) NeuralGreedy (LinGreedy, respectively) which uses neural networks (linear models, respectively) to fit the offline data and act greedily with respect to the estimated state-action value functions without any pessimism. Note that when the parametric class, F, in Algorithm 1 is that of neural networks, we refer to VIPeR as Neural-VIPeR. We do not utilize data splitting in the experiments. We provide further algorithmic details of the baselines in Section H. We evaluate all algorithms in two problem settings: (1) the underlying MDP is a linear MDP whose reward functions and transition kernels are linear in some known feature map (Jin et al., 2020), and (2) the underlying MDP is non-linear with horizon length H = 1 (i.e., non-linear contextual bandits) (Zhou et al., 2020), where the reward function is either synthetic or constructed from MNIST
arXiv.org Artificial Intelligence
Mar-3-2023
- Country:
- North America > United States
- Maryland > Baltimore (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Virginia > Arlington County
- Arlington (0.04)
- Colorado > Boulder County
- Boulder (0.04)
- California > San Diego County
- San Diego (0.04)
- Europe > United Kingdom
- England > Greater London > London (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.48)
- Technology: