Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

We like to thank the reviewers for their positive feedback! General comments: - Although we agree that the assumption of the Plackett-Luce model (as a generalization of the Bradley-Terry model) may appear restrictive and will certainly not be satisfied in all practical applications, we like to emphasize that the PL model, in addition to the Mallows model, is the standard model in the statistics of rank data and widely used in many fields of applied statistics, e.g., voting and discrete choice theory in economics -- its status in these fields is comparable to the status of the Gaussian distribution for real-valued data. Therefore, we are convinced that studying the dueling bandits problem under this assumption is a worthwhile endeavor. In this regard, we also like to mention that the PL model has already been studied in the context of other preference learning problems as well (for example, see papers at ICML 2009 and 2010). Rev 1: The confidence intervals in our paper are derived from Hoeffding's inequality in a standard way.