Off-Policy Interval Estimation with Lipschitz Value Iteration

Oct-3-2025, 00:03:19 GMT–Neural Information Processing Systems

Reinforcement learning (RL) (e.g., Sutton & Barto, 1998) has become widely used in tasks like Li, 2016; Liu et al., 2018a), estimating the expected reward of a target policy using observational data gathered from previous behavior policies, therefore holds tremendous promise for designing Our method is efficient and provably convergent. Our work is closely related to the off-policy point estimation.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Oct-3-2025, 00:03:19 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Texas (0.14)

Industry:
- Health & Medicine (0.75)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Optimization (0.94)

Duplicate Docs Excel Report

Title
Off-Policy IntervalEstimationwith LipschitzValueIteration

Similar Docs Excel Report more

Title	Similarity	Source
None found