When is Realizability Sufficient for Off-Policy Reinforcement Learning?