Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

Li, Shuangning, Wang, Chonghuan, Wang, Jingyan

arXiv.org Machine Learning 

Recommendation systems are widely deployed across online platforms. Users receive numerous recommendations every day, including news and creators' content on social media, products in online marketplaces, services in freelancing labor markets, ads on websites, and so on. During the development of such recommendation systems, a crucial task that companies face all the time is to compare the performance of different recommendation algorithms, and make business decisions on which one to eventually deploy in production. A common approach to comparing the performance of two recommendation algorithms is through randomized controlled trials, also known as A/B experiments. In a typical user-randomized A/B experiment, each user is assigned to a treatment group (running one recommendation algorithm) or a control group (running the other recommendation algorithm), uniformly at random. The metric to measure the performance of the two algorithms can be, for example, user engagement, click-through rates, purchase revenues, etc. Our goal is to estimate the global treatment effect (GTE), the difference between the treatment group and the control group in terms of this performance metric. More precisely, the GTE is defined as the difference in this performance metric between deploying the treatment algorithm to all users versus deploying the control algorithm to all users.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found