Off-Policy Evaluation for Human Feedback Qitong Gao Ge Gao

Neural Information Processing Systems 

Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found