Collaborating Authors

Stochastic Online Greedy Learning with Semi-bandit Feedbacks

Neural Information Processing Systems

The greedy algorithm is extensively studied in the field of combinatorial optimization for decades. In this paper, we address the online learning problem when the input to the greedy algorithm is stochastic with unknown parameters that have to be learned over time. We first propose the greedy regret and $\epsilon$-quasi greedy regret as learning metrics comparing with the performance of offline greedy algorithm. We then propose two online greedy learning algorithms with semi-bandit feedbacks, which use multi-armed bandit and pure exploration bandit policies at each level of greedy learning, one for each of the regret metrics respectively. Both algorithms achieve $O(\log T)$ problem-dependent regret bound ($T$ being the time horizon) for a general class of combinatorial structures and reward functions that allow greedy solutions. We further show that the bound is tight in $T$ and other problem instance parameters.

The Power of Verification for Greedy Mechanism Design

Journal of Artificial Intelligence Research

Greedy algorithms are known to provide, in polynomial time, near optimal approximation guarantees for Combinatorial Auctions (CAs) with multidimensional bidders. It is known that truthful greedy-like mechanisms for CAs with multi-minded bidders do not achieve good approximation guarantees. In this work, we seek a deeper understanding of greedy mechanism design and investigate under which general assumptions, we can have efficient and truthful greedy mechanisms for CAs. Towards this goal, we use the framework of priority algorithms and weak and strong verification, where the bidders are not allowed to overbid on their winning set or on any subset of this set, respectively. We provide a complete characterization of the power of weak verification showing that it is sufficient and necessary for any greedy fixed priority algorithm to become truthful with the use of money or not, depending on the ordering of the bids.

Interlaced Greedy Algorithm for Maximization of Submodular Functions in Nearly Linear Time

Neural Information Processing Systems

A deterministic approximation algorithm is presented for the maximization of non-monotone submodular functions over a ground set of size $n$ subject to cardinality constraint $k$; the algorithm is based upon the idea of interlacing two greedy procedures. The algorithm uses interlaced, thresholded greedy procedures to obtain tight ratio $1/4 - \epsilon$ in $O \left( \frac{n}{\epsilon} \log \left( \frac{k}{\epsilon} \right) \right)$ queries of the objective function, which improves upon both the ratio and the quadratic time complexity of the previously fastest deterministic algorithm for this problem. The algorithm is validated in the context of two applications of non-monotone submodular maximization, on which it outperforms the fastest deterministic and randomized algorithms in prior literature. Papers published at the Neural Information Processing Systems Conference.

Mostly Exploration-Free Algorithms for Contextual Bandits Machine Learning

The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate-optimal if there is sufficient randomness in the observed contexts. We prove that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition we term covariate diversity. Furthermore, even absent this condition, we show that a greedy algorithm can be rate-optimal with nonzero probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate-optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces experimentation and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or UCB.

Dictionary descent in optimization Machine Learning

The problem of convex optimization is studied. Usually in convex optimization the minimization is over a d-dimensional domain. Very often the convergence rate of an optimization algorithm depends on the dimension d. The algorithms studied in this paper utilize dictionaries instead of a canonical basis used in the coordinate descent algorithms. We show how this approach allows us to reduce dimensionality of the problem. Also, we investigate which properties of a dictionary are beneficial for the convergence rate of typical greedy-type algorithms.