A Further Related Work

Neural Information Processing Systems 

The "dueling bandits" problem, initially proposed as a model for similar recommendation systems A number of works in recent years explore online problems where an agent responds to the decision-maker's actions, influencing their reward. The "revealed preferences" literature involves a similar requirement of learning a mapping Some recent work has begun to explore the problem of designing optimal strategies in a repeated game against agents who adapt their strategies over time using a no-regret algorithm. As such, the empirical probability of b must be close to 1 /2. We make use of a lemma from [2], which we restate here. Lemma 8. Consider two vectors We prove local learnability results for each case.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found