On the Reliability of Sampling Strategies in Offline Recommender Evaluation