Goto

Collaborating Authors

 pq 1


Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet

Neural Information Processing Systems

Consider a player that in each of T rounds chooses one of K arms. An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays {dt} that are unknown to the player. After picking arm at at round t, the player receives the cost of playing this arm dt rounds later. In cases where t + dt > T, this feedback is simply missing.


On the Convergence of Coordinate Ascent Variational Inference

Bhattacharya, Anirban, Pati, Debdeep, Yang, Yun

arXiv.org Artificial Intelligence

As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming more and more popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter estimation under various settings; meanwhile, formal analysis on the algorithmic convergence aspects of VI is still largely lacking. In this paper, we consider the common coordinate ascent variational inference (CAVI) algorithm for implementing the mean-field (MF) VI towards optimizing a Kullback--Leibler divergence objective functional over the space of all factorized distributions. Focusing on the two-block case, we analyze the convergence of CAVI by leveraging the extensive toolbox from functional analysis and optimization. We provide general conditions for certifying global or local exponential convergence of CAVI. Specifically, a new notion of generalized correlation for characterizing the interaction between the constituting blocks in influencing the VI objective functional is introduced, which according to the theory, quantifies the algorithmic contraction rate of two-block CAVI. As illustrations, we apply the developed theory to a number of examples, and derive explicit problem-dependent upper bounds on the algorithmic contraction rate.


Agility and Target Distribution in the Dynamic Stochastic Traveling Salesman Problem

Adler, Aviv, Gal, Oren, Karaman, Sertac

arXiv.org Artificial Intelligence

An important variant of the classic Traveling Salesman Problem (TSP) is the Dynamic TSP, in which a system with dynamic constraints is tasked with visiting a set of n target locations (in any order) in the shortest amount of time. Such tasks arise naturally in many robotic motion planning problems, particularly in exploration, surveillance and reconnaissance, and classical TSP algorithms on graphs are typically inapplicable in this setting. An important question about such problems is: if the target points are random, what is the length of the tour (either in expectation or as a concentration bound) as n grows? This problem is the Dynamic Stochastic TSP (DSTSP), and has been studied both for specific important vehicle models and for general dynamic systems; however, in general only the order of growth is known. In this work, we explore the connection between the distribution from which the targets are drawn and the dynamics of the system, yielding a more precise lower bound on tour length as well as a matching upper bound for the case of symmetric (or driftless) systems. We then extend the symmetric dynamics results to the case when the points are selected by a (non-random) adversary whose goal is to maximize the length, thus showing worst-case bounds on the tour length.


Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Khamaru, Koulik, Xia, Eric, Wainwright, Martin J., Jordan, Michael I.

arXiv.org Machine Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, however, primarily within the confines of theory, providing guarantees that explain \textit{ex post} the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice. We address the problem of obtaining sharp instance-dependent confidence regions for the policy evaluation problem and the optimal value estimation problem of an MDP, given access to an instance-optimal algorithm. As a consequence, we propose a data-dependent stopping rule for instance-optimal algorithms. The proposed stopping rule adapts to the instance-specific difficulty of the problem and allows for early termination for problems with favorable structure.


COVID-19 Pandemic Cyclic Lockdown Optimization Using Reinforcement Learning

Arango, Mauricio, Pelov, Lyudmil

arXiv.org Artificial Intelligence

This work examines the use of reinforcement learning (RL) to optimize cyclic lockdowns, which is one of the methods available for control of the COVID-19 pandemic. The problem is structured as an optimal control system for tracking a reference value, corresponding to the maximum usage level of a critical resource, such as ICU beds. However, instead of using conventional optimal control methods, RL is used to find optimal control policies. A framework was developed to calculate optimal cyclic lockdown timings using an RL-based on-off controller. The RL-based controller is implemented as an RL agent that interacts with an epidemic simulator, implemented as an extended SEIR epidemic model. The RL agent learns a policy function that produces an optimal sequence of open/lockdown decisions such that goals specified in the RL reward function are optimized. Two concurrent goals were used: the first one is a public health goal that minimizes overshoots of ICU bed usage above an ICU bed threshold, and the second one is a socio-economic goal that minimizes the time spent under lockdowns. It is assumed that cyclic lockdowns are considered as a temporary alternative to extended lockdowns when a region faces imminent danger of overpassing resource capacity limits and when imposing an extended lockdown would cause severe social and economic consequences due to lack of necessary economic resources to support its affected population during an extended lockdown.


A Mutual Contamination Analysis of Mixed Membership and Partial Label Models

Katz-Samuels, Julian, Scott, Clayton

arXiv.org Machine Learning

Many machine learning problems can be characterized by mutual contamination models. In these problems, one observes several random samples from different convex combinations of a set of unknown base distributions. It is of interest to decontaminate mutual contamination models, i.e., to recover the base distributions either exactly or up to a permutation. This paper considers the general setting where the base distributions are defined on arbitrary probability spaces. We examine the decontamination problem in two mutual contamination models that describe popular machine learning tasks: recovering the base distributions up to a permutation in a mixed membership model, and recovering the base distributions exactly in a partial label model for classification. We give necessary and sufficient conditions for identifiability of both mutual contamination models, algorithms for both problems in the infinite and finite sample cases, and introduce novel proof techniques based on affine geometry.


Learning a Robust Consensus Matrix for Clustering Ensemble via Kullback-Leibler Divergence Minimization

Zhou, Peng (Chinese Academy of Sciences) | Du, Liang (Chinese Academy of Sciences) | Wang, Hanmo (Chinese Academy of Sciences) | Shi, Lei (Chinese Academy of Sciences) | Shen, Yi-Dong (Chinese Academy of Sciences)

AAAI Conferences

Clustering ensemble has emerged as an important extension of the classical clustering problem. It provides a framework for combining multiple base clusterings of a data set to generate a final consensus result. Most existing clustering methods simply combine clustering results without taking into account the noises, which may degrade the clustering performance. In this paper, we propose a novel robust clustering ensemble method. To improve the robustness, we capture the sparse and symmetric errors and integrate them into our robust and consensus framework to learn a low-rank matrix. Since the optimization of the objective function is difficult to solve, we develop a block coordinate descent algorithm which is theoretically guaranteed to converge. Experimental results on real world data sets demonstrate the effectiveness of our method.