Oracle Inequalitiesfor Model Selection in Offline Reinforcement Learning