From Clicks to Conversions: Recommendation for long-term reward

Chagniot, Philomène, Vasile, Flavian, Rohde, David

arXiv.org Machine Learning 

A modern approach to recommendation will look at this log in order to improve future recommendations. By examining how similar users respond to different recommendations it becomes possible to discover better recommendations and continue to improve the system. This procedure of learning by experimentation in some respects mimics randomized control trials in medicine where populations are split into two and different treatments are delivered to similar groups. Medical trials are however simpler, as an intervention or a placebo is administered to each group and then long-term impacts are observed with no further interventions delivered. The challenges of credit attribution in the case of delayed reward and multiple actions. In contrast with medical trials, where the treatment is frequently a binary variable, recommender systems will deliver multiple actions at variable times leading to combinatorially complex treatments. For simplicity, in our previous work on RecoGym[2], we assumed that both the current recommendation and the reward are conditionally independent on past actions, therefore making the recommendation amenable to contextual bandits and supervised value modeling approaches.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found