Online Learning for Non-Stationary A/B Tests

Medina, Andrés Muñoz, Vassilvitskii, Sergei, Yin, Dong

arXiv.org Machine Learning 

Whether it is a minor tweak, or a major new update, releasing a new version of a running system is a stressful time. While the release has typically gone through rounds of offline testing, real world testing often uncovers additional corner cases that may manifest themselves as bugs, inefficiencies, or overall poor performance. This is especially the case in machine learning applications, where models are typically trained to maximize a proxy objective, and a model that performs better on offline metrics is not guaranteed to work well in practice. The usual approach in such scenarios is to evaluate the new system through a series of closely monitored A/B tests. The new version is usually released to a small number of customers, and, if no concerns are found and metrics look good, the portion of traffic served by the new system is slowly increased. While A/B tests provide a sense of safety in that a detrimental change will be quickly observed and corrected (or rolled back), they are not a silver bullet. First, A/B tests are labor intensive--they are typically monitored manually, with an engineer, or a technician, checking the results of the test on a regular basis (for example, daily or weekly). Second, the evaluation is usually dependent on average metrics--e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found