When is Offline Policy Selection Sample Efficient for Reinforcement Learning?