Beyond NDCG: behavioral testing of recommender systems with RecList
Chia, Patrick John, Tagliabue, Jacopo, Bianchi, Federico, He, Chloe, Ko, Brian
–arXiv.org Artificial Intelligence
As with most Machine Learning systems, recommender systems are typically evaluated through performance metrics computed over held-out data points. However, real-world behavior is undoubtedly nuanced: ad hoc error analysis and deployment-specific tests must be employed to ensure the desired quality in actual deployments. In this paper, we propose RecList, a behavioral-based testing methodology. RecList organizes recommender systems by use case and introduces a general plug-and-play procedure to scale up behavioral testing. We demonstrate its capabilities by analyzing known algorithms and black-box commercial systems, and we release RecList as an open source, extensible package for the community.
arXiv.org Artificial Intelligence
Nov-18-2021
- Country:
- North America > United States > California (0.28)
- Genre:
- Research Report (0.50)
- Industry:
- Information Technology (1.00)
- Leisure & Entertainment (0.68)
- Media (0.68)
- Technology: