Reviews: Adaptive Learning with Unknown Information Flows

Neural Information Processing Systems 

Summary: The paper proposes a new multi-armed bandit (MAB) formulation, in which the agent may observe reward realizations of some of the arms arbitrarily before each round. The paper studies the impact of information flows on regret performance by deriving the regret lower bound with respect to information flows. Moreover, the paper proposes an adaptive exploration policy matching the regret lower bound. However, the insight under this setting is somewhat unclear to the reviewer. Therefore, the reviewer suggests voting for "weak accept".