Search
Context-lumpable stochastic bandits
We consider a contextual bandit problem with S contexts and K actions. In each round t = 1,2,... the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into r min{S,K}groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an ฮต-optimal policy after using at most eO(r(S+K)/ฮต2) samples with high probability and provide a matching โฆ(r(S + K)/ฮต2) lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time T is bounded by eO( p r3(S+K)T). To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and eO( p poly(r)(S+K)T)minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios.
k-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy
We propose a new initialization scheme for the k-median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. We propose a novel and efficient search algorithm which finds initial centers that can be used subsequently for the local search algorithm. The so-called HST initialization method can produce initial centers achieving lower error than those from another popular method k-median++, also with higher efficiency when k is not too small. Our HST initialization are then extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error of applying DP local search followed by our private HST initialization improves prior results on the approximation error, and approaches the lower bound within a small factor. Experiments demonstrate the effectiveness of our proposed methods.
BanditPAM++: Faster k-medoids Clustering
Clustering is a fundamental task in data science with wide-ranging applications. In k-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in k-medoids clustering, respectively.
Certified Minimax Unlearning with Generalization Rates and Deletion Capacity
We study the problem of (ฯต,ฮด)-certified machine unlearning for minimax models. Most of the existing works focus on unlearning from standard statistical learning models that have a single variable and their unlearning steps hinge on the direct Hessian-based conventional Newton update. We develop a new (ฯต,ฮด)-certified machine unlearning algorithm for minimax models. It proposes a minimax unlearning step consisting of a total Hessian-based complete Newton update and the Gaussian mechanism borrowed from differential privacy. To obtain the unlearning certification, our method injects calibrated Gaussian noises by carefully analyzing the "sensitivity" of the minimax unlearning step (i.e., the closeness between the minimax unlearning variables and the retraining-from-scratch variables).
Certified Minimax Unlearning with Generalization Rates and Deletion Capacity
We study the problem of (ฯต,ฮด)-certified machine unlearning for minimax models. Most of the existing works focus on unlearning from standard statistical learning models that have a single variable and their unlearning steps hinge on the direct Hessian-based conventional Newton update. We develop a new (ฯต,ฮด)-certified machine unlearning algorithm for minimax models. It proposes a minimax unlearning step consisting of a total Hessian-based complete Newton update and the Gaussian mechanism borrowed from differential privacy. To obtain the unlearning certification, our method injects calibrated Gaussian noises by carefully analyzing the "sensitivity" of the minimax unlearning step (i.e., the closeness between the minimax unlearning variables and the retraining-from-scratch variables).
Online combinatorial optimization with stochastic decision sets and adversarial losses
Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.