No-Regret M {} {\natural} -Concave Function Maximization: Stochastic Bandit Algorithms and NP-Hardness of Adversarial Full-Information Setting

Neural Information Processing Systems 

In practice, perfect knowledge of M {} { atural} -concave functions is often unavailable a priori, and we can optimize them only interactively based on some feedback. Motivated by such situations, we study online M {} { atural} -concave function maximization problems, which are interactive versions of the problem studied by Murota and Shioura (1999). A key to proving these results is the robustness of the greedy algorithm to local errors in M {} { atural} -concave function maximization, which is one of our main technical results. While we obtain those positive results for the stochastic setting, another main result of our work is an impossibility in the adversarial setting. We prove that, even with full-information feedback, no algorithms that run in polynomial time per round can achieve O(T {1-c}) regret for any constant c 0 unless \mathsf{P} \mathsf{NP} .