From Clicks to Conversions: Recommendation for long-term reward
Chagniot, Philomène, Vasile, Flavian, Rohde, David
A modern approach to recommendation will look at this log in order to improve future recommendations. By examining how similar users respond to different recommendations it becomes possible to discover better recommendations and continue to improve the system. This procedure of learning by experimentation in some respects mimics randomized control trials in medicine where populations are split into two and different treatments are delivered to similar groups. Medical trials are however simpler, as an intervention or a placebo is administered to each group and then long-term impacts are observed with no further interventions delivered. The challenges of credit attribution in the case of delayed reward and multiple actions. In contrast with medical trials, where the treatment is frequently a binary variable, recommender systems will deliver multiple actions at variable times leading to combinatorially complex treatments. For simplicity, in our previous work on RecoGym[2], we assumed that both the current recommendation and the reward are conditionally independent on past actions, therefore making the recommendation amenable to contextual bandits and supervised value modeling approaches.
Sep-1-2020
- Country:
- North America > United States > California > Los Angeles County > Long Beach (0.05)
- Genre:
- Research Report (0.91)
- Technology: