Beyond NDCG: behavioral testing of recommender systems with RecList

Chia, Patrick John, Tagliabue, Jacopo, Bianchi, Federico, He, Chloe, Ko, Brian

Nov-18-2021–arXiv.org Artificial Intelligence

As with most Machine Learning systems, recommender systems are typically evaluated through performance metrics computed over held-out data points. However, real-world behavior is undoubtedly nuanced: ad hoc error analysis and deployment-specific tests must be employed to ensure the desired quality in actual deployments. In this paper, we propose RecList, a behavioral-based testing methodology. RecList organizes recommender systems by use case and introduces a general plug-and-play procedure to scale up behavioral testing. We demonstrate its capabilities by analyzing known algorithms and black-box commercial systems, and we release RecList as an open source, extensible package for the community.

reclist, recommendation, recommender system, (12 more...)

arXiv.org Artificial Intelligence

Nov-18-2021

arXiv.org PDF

Add feedback

Country:
- South America > Brazil (0.05)
- North America
  - Canada (0.04)
  - United States
    - New York > New York County
      - New York City (0.06)
    - Massachusetts > Suffolk County
      - Boston (0.04)
    - California > Santa Clara County
      - Palo Alto (0.04)
    - Alaska > Anchorage Municipality
      - Anchorage (0.04)
- Europe
  - France (0.05)
  - Italy (0.04)
  - United Kingdom > England
    - Greater London > London (0.04)
- Asia > China
  - Beijing > Beijing (0.04)

Genre:
- Research Report (0.50)

Industry:
- Information Technology (1.00)
- Leisure & Entertainment (0.68)
- Media (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Personal Assistant Systems (1.00)
  - Machine Learning (1.00)