preference perceptron
Coactive Learning
Shivaswamy, Pannaga, Joachims, Thorsten
We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. Interactions in the Coactive Learning model take the following form: at each step, the system (e.g. search engine) receives a context (e.g. query) and predicts an object (e.g. ranking); the user responds by correcting the system if necessary, providing a slightly improved but not necessarily optimal object as feedback. We argue that such preference feedback can be inferred in large quantity from observable user behavior (e.g., clicks in web search), unlike the optimal feedback required in the expert model or the cardinal valuations required for bandit learning. Despite the relaxed requirements for the feedback, we show that it is possible to adapt many existing online learning algorithms to the coactive framework. In particular, we provide algorithms that achieve square root regret in terms of cardinal utility, even though the learning algorithm never observes cardinal utility values directly. We also provide an algorithm with logarithmic regret in the case of strongly convex loss functions. An extensive empirical study demonstrates the applicability of our model and algorithms on a movie recommendation task, as well as ranking for web search.
Online Structured Prediction via Coactive Learning
Shivaswamy, Pannaga, Joachims, Thorsten
We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. At each step, the system (e.g. search engine) receives a context (e.g. query) and predicts an object (e.g. ranking). The user responds by correcting the system if necessary, providing a slightly improved -- but not necessarily optimal -- object as feedback. We argue that such feedback can often be inferred from observable user behavior, for example, from clicks in web-search. Evaluating predictions by their cardinal utility to the user, we propose efficient learning algorithms that have ${\cal O}(\frac{1}{\sqrt{T}})$ average regret, even though the learning algorithm never observes cardinal utility values as in conventional online learning. We demonstrate the applicability of our model and learning algorithms on a movie recommendation task, as well as ranking for web-search.