Beyond NDCG: behavioral testing of recommender systems with RecList