Goto

Collaborating Authors

 minimax optimal submodular maximization


Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

Neural Information Processing Systems

We consider maximizing an unknown monotonic, submodular set function f: 2 {[n]} \rightarrow [0,1] with cardinality constraint under stochastic bandit feedback. At each time t 1,\dots,T the learner chooses a set S_t \subset [n] with S_t \leq k and receives reward f(S_t) \eta_t where \eta_t is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum f(S_*) with S_* k, obtained through robust greedy maximization of f . To date, the best regret bound in the literature scales as k n {1/3} T {2/3} . And by trivially treating every set as a unique arm one deduces that \sqrt{ {n \choose k} T } is also achievable using standard multi-armed bandit algorithms.