Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning