Reviews: Adaptive Learning with Unknown Information Flows

Oct-7-2024, 21:32:14 GMT–Neural Information Processing Systems

Summary: The paper proposes a new multi-armed bandit (MAB) formulation, in which the agent may observe reward realizations of some of the arms arbitrarily before each round. The paper studies the impact of information flows on regret performance by deriving the regret lower bound with respect to information flows. Moreover, the paper proposes an adaptive exploration policy matching the regret lower bound. However, the insight under this setting is somewhat unclear to the reviewer. Therefore, the reviewer suggests voting for "weak accept".

adaptive learning, auxiliary information, unknown information flow, (7 more...)

Neural Information Processing Systems

Oct-7-2024, 21:32:14 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.40)