VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation

Mar-3-2023–arXiv.org Artificial Intelligence

In this section, we empirically evaluate the proposed algorithm VIPeR against several state-of-the-art baselines, including (a) PEVI (Jin et al., 2021), which explicitly constructs lower confidence bound (LCB) for pessimism in a linear model (thus, we rename this algorithm as LinLCB for convenience in our experiments); (b) NeuraLCB (Nguyen-Tang et al., 2022a) which explicitly constructs an LCB using neural network gradients; (c) NeuraLCB (Diag), which is NeuraLCB with a diagonal approximation for estimating the confidence set as suggested in NeuraLCB (Nguyen-Tang et al., 2022a); (d) Lin-VIPeR which is VIPeR realized to the linear function approximation instead of neural network function approximation; (e) NeuralGreedy (LinGreedy, respectively) which uses neural networks (linear models, respectively) to fit the offline data and act greedily with respect to the estimated state-action value functions without any pessimism. Note that when the parametric class, F, in Algorithm 1 is that of neural networks, we refer to VIPeR as Neural-VIPeR. We do not utilize data splitting in the experiments. We provide further algorithmic details of the baselines in Section H. We evaluate all algorithms in two problem settings: (1) the underlying MDP is a linear MDP whose reward functions and transition kernels are linear in some known feature map (Jin et al., 2020), and (2) the underlying MDP is non-linear with horizon length H = 1 (i.e., non-linear contextual bandits) (Zhou et al., 2020), where the reward function is either synthetic or constructed from MNIST

artificial intelligence, inequality hold, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Mar-3-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Maryland > Baltimore (0.04)
  - Wisconsin > Dane County
    - Madison (0.04)
  - Virginia > Arlington County
    - Arlington (0.04)
  - Colorado > Boulder County
    - Boulder (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.82)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found