Why generalization in RL is difficult: epistemic POMDPs and implicit partial observability