full information
Reviewer # 1: We tried to further explain the connection to Neu and Zhivotovskiy in Appendix E, who consider the
We would like to thank all referees for their close reading of the manuscript. The second reason to include the multiclass setting is the bandit setting. For an overview of the different bounds we provided Table 1. The parameters can be found in Section 2. Importantly, Gaptron often is on par with, if not better than slower algorithms such as ONS. This lower bound does not apply to our setting, where the learner suffers the zero-one loss.
06d5ae105ea1bea4d800bc96491876e9-AuthorFeedback.pdf
We thank all the reviewers for the constructive comments. We address the major concerns below. Reproducibility: 1) learning to draft details; 2) feature details; 3) discussions on the computing resources used. The search tree is updated based on four steps of MCTS. The learning rate is set to 0.001 with Adam.
Beyond Bandit Feedback in Online Multiclass Classification
We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. While including bandit feedback as a special case, feedback graphs allow a much richer set of applications, including filtering and label efficient classification.
Reviews: Regret Bounds for Online Portfolio Selection with a Cardinality Constraint
Summary The paper studies the online portfolio selection problem under cardinality constraints and provides two algorithms that achieve sublinear regret. One algorithm handles the full information setting and the other algorithm handles the bandit feedback setting. Furthermore, the paper provides lower bounds for both the full information and bandit feedback settings. The approach that both algorithms take is to split the problem into two learning problems. One problem is to learn the optimal combination of assets and the other problem is to learn the optimal portfolio. To learn the optimal combination of assets a version of either the multiplicative weights algorithm (full information) or exp3 (bandit feedback) is used.
Learning from Streaming Data when Users Choose
Moreover, due to the data-driven nature of digital platforms, interesting dynamics emerge among users and service In digital markets comprised of many competing providers: on the one hand, users choose amongst services, each user chooses between multiple providers based on the quality of their services; on the other service providers according to their preferences, hand, providers use the user data to improve and update and the chosen service makes use of the user data their services, affecting future user choices (Ginart et al., to incrementally improve its model. The service 2021; Kwon et al., 2022; Dean et al., 2024; Jagadeesan et al., providers' models influence which service the 2023a). For example, in personalized music streaming platform, user will choose at the next time step, and the a user chooses amongst different music streaming user's choice, in return, influences the model update, platforms based on how well they meet the user's needs.
Bayesian Persuasion for Algorithmic Recourse
Harris, Keegan, Chen, Valerie, Kim, Joon Sik, Talwalkar, Ameet, Heidari, Hoda, Wu, Zhiwei Steven
When subjected to automated decision-making, decision subjects may strategically modify their observable features in ways they believe will maximize their chances of receiving a favorable decision. In many practical situations, the underlying assessment rule is deliberately kept secret to avoid gaming and maintain competitive advantage. The resulting opacity forces the decision subjects to rely on incomplete information when making strategic feature modifications. We capture such settings as a game of Bayesian persuasion, in which the decision maker offers a form of recourse to the decision subject by providing them with an action recommendation (or signal) to incentivize them to modify their features in desirable ways. We show that when using persuasion, the decision maker and decision subject are never worse off in expectation, while the decision maker can be significantly better off. While the decision maker's problem of finding the optimal Bayesian incentive-compatible (BIC) signaling policy takes the form of optimization over infinitely-many variables, we show that this optimization can be cast as a linear program over finitely-many regions of the space of possible assessment rules. While this reformulation simplifies the problem dramatically, solving the linear program requires reasoning about exponentially-many variables, even in relatively simple cases. Motivated by this observation, we provide a polynomial-time approximation scheme that recovers a near-optimal signaling policy. Finally, our numerical simulations on semi-synthetic data empirically demonstrate the benefits of using persuasion in the algorithmic recourse setting.