A maximum-entropy approach to off-policy evaluation in average-reward MDPs

Open in new window