A General Framework for Off-Policy Learning with Partially-Observed Reward