Goto

Collaborating Authors

 Glynn, Peter


Deep Learning for Computing Convergence Rates of Markov Chains

arXiv.org Machine Learning

Convergence rate analysis for general state-space Markov chains is fundamentally important in areas such as Markov chain Monte Carlo and algorithmic analysis (for computing explicit convergence bounds). This problem, however, is notoriously difficult because traditional analytical methods often do not generate practically useful convergence bounds for realistic Markov chains. We propose the Deep Contractive Drift Calculator (DCDC), the first general-purpose sample-based algorithm for bounding the convergence of Markov chains to stationarity in Wasserstein distance. The DCDC has two components. First, inspired by the new convergence analysis framework in (Qu et al., 2023), we introduce the Contractive Drift Equation (CDE), the solution of which leads to an explicit convergence bound. Second, we develop an efficient neural-network-based CDE solver. Equipped with these two components, DCDC solves the CDE and converts the solution into a convergence bound. We analyze the sample complexity of the algorithm and further demonstrate the effectiveness of the DCDC by generating convergence bounds for realistic Markov chains arising from stochastic processing networks as well as constant step-size stochastic optimization.


Optimal Sample Complexity for Average Reward Markov Decision Processes

arXiv.org Machine Learning

We settle the sample complexity of policy learning for the maximization of the long run average reward associated with a uniformly ergodic Markov decision process (MDP), assuming a generative model. In this context, the existing literature provides a sample complexity upper bound of $\widetilde O(|S||A|t_{\text{mix}}^2 \epsilon^{-2})$ and a lower bound of $\Omega(|S||A|t_{\text{mix}} \epsilon^{-2})$. In these expressions, $|S|$ and $|A|$ denote the cardinalities of the state and action spaces respectively, $t_{\text{mix}}$ serves as a uniform upper limit for the total variation mixing times, and $\epsilon$ signifies the error tolerance. Therefore, a notable gap of $t_{\text{mix}}$ still remains to be bridged. Our primary contribution is to establish an estimator for the optimal policy of average reward MDPs with a sample complexity of $\widetilde O(|S||A|t_{\text{mix}}\epsilon^{-2})$, effectively reaching the lower bound in the literature. This is achieved by combining algorithmic ideas in Jin and Sidford (2021) with those of Li et al. (2020).


Optimal Sample Complexity of Reinforcement Learning for Mixing Discounted Markov Decision Processes

arXiv.org Machine Learning

We consider the optimal sample complexity theory of tabular reinforcement learning (RL) for maximizing the infinite horizon discounted reward in a Markov decision process (MDP). Optimal worst-case complexity results have been developed for tabular RL problems in this setting, leading to a sample complexity dependence on $\gamma$ and $\epsilon$ of the form $\tilde \Theta((1-\gamma)^{-3}\epsilon^{-2})$, where $\gamma$ denotes the discount factor and $\epsilon$ is the solution error tolerance. However, in many applications of interest, the optimal policy (or all policies) induces mixing. We establish that in such settings, the optimal sample complexity dependence is $\tilde \Theta(t_{\text{mix}}(1-\gamma)^{-2}\epsilon^{-2})$, where $t_{\text{mix}}$ is the total variation mixing time. Our analysis is grounded in regeneration-type ideas, which we believe are of independent interest, as they can be used to study RL problems for general state space MDPs.


Probabilistic Contraction Analysis of Iterated Random Operators

arXiv.org Artificial Intelligence

In many branches of engineering, Banach contraction mapping theorem is employed to establish the convergence of certain deterministic algorithms. Randomized versions of these algorithms have been developed that have proved useful in data-driven problems. In a class of randomized algorithms, in each iteration, the contraction map is approximated with an operator that uses independent and identically distributed samples of certain random variables. This leads to iterated random operators acting on an initial point in a complete metric space, and it generates a Markov chain. In this paper, we develop a new stochastic dominance based proof technique, called probabilistic contraction analysis, for establishing the convergence in probability of Markov chains generated by such iterated random operators in certain limiting regime. The methods developed in this paper provides a general framework for understanding convergence of a wide variety of Monte Carlo methods in which contractive property is present. We apply the convergence result to conclude the convergence of fitted value iteration and fitted relative value iteration in continuous state and continuous action Markov decision problems as representative applications of the general framework developed here.


Surgical Scheduling via Optimization and Machine Learning with Long-Tailed Data

arXiv.org Artificial Intelligence

Using data from cardiovascular surgery patients with long and highly variable post-surgical lengths of stay (LOS), we develop a modeling framework to reduce recovery unit congestion. We estimate the LOS and its probability distribution using machine learning models, schedule procedures on a rolling basis using a variety of optimization models, and estimate performance with simulation. The machine learning models achieved only modest LOS prediction accuracy, despite access to a very rich set of patient characteristics. Compared to the current paper-based system used in the hospital, most optimization models failed to reduce congestion without increasing wait times for surgery. A conservative stochastic optimization with sufficient sampling to capture the long tail of the LOS distribution outperformed the current manual process and other stochastic and robust optimization approaches. These results highlight the perils of using oversimplified distributional models of LOS for scheduling procedures and the importance of using optimization methods well-suited to dealing with long-tailed behavior.


Optimal best arm selection for general distributions

arXiv.org Machine Learning

Given a finite set of unknown distributions $\textit{or arms}$ that can be sampled from, we consider the problem of identifying the one with the largest mean using a delta-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified delta) that has minimum sample complexity. Lower bounds for delta-correct algorithms are well known. Further, delta-correct algorithms that match the lower bound asymptotically as delta reduces to zero have also been developed in literature when the arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential as otherwise under a delta-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a delta-correct algorithm that matches the lower bound as delta reduces to zero under a mild restriction that a known bound on the expectation of a non-negative, increasing convex function (for example, the squared moment) of underlying random variables, exists. We also propose batch processing and identify optimal batch sizes to substantially speed up the proposed algorithm. This best arm selection problem is a well studied classic problem in the simulation community. It has many learning applications including in recommendation systems and in product selection.


Optimal Transport Relaxations with Application to Wasserstein GANs

arXiv.org Machine Learning

We propose a family of relaxations of the optimal transport problem which regularize the problem by introducing an additional minimization step over a small region around one of the underlying transporting measures. The type of regularization that we obtain is related to smoothing techniques studied in the optimization literature. When using our approach to estimate optimal transport costs based on empirical measures, we obtain statistical learning bounds which are useful to guide the amount of regularization, while maintaining good generalization properties. To illustrate the computational advantages of our regularization approach, we apply our method to training Wasserstein GANs. We obtain running time improvements, relative to current benchmarks, with no deterioration in testing performance (via FID). The running time improvement occurs because our new optimality-based threshold criterion reduces the number of expensive iterates of the generating networks, while increasing the number of actor-critic iterations.


Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning

arXiv.org Machine Learning

The goal of this paper is to provide a unifying view of a wide range of problems of interest in machine learning by framing them as the minimization of functionals defined on the space of probability measures. In particular, we show that generative adversarial networks, variational inference, and actor-critic methods in reinforcement learning can all be seen through the lens of our framework. We then discuss a generic optimization algorithm for our formulation, called probability functional descent (PFD), and show how this algorithm recovers existing methods developed independently in the settings mentioned earlier.


An Accelerated Approach to Safely and Efficiently Test Pre-produced Autonomous Vehicles on Public Streets

arXiv.org Artificial Intelligence

Various automobile and mobility companies, for instance, Ford, Uber, and Waymo, are currently testing their pre-produced autonomous vehicle (AV) fleets on the public roads. However, due to the rareness of the safety-critical cases and, effectively, unlimited number of possible traffic scenarios, these on-road testing efforts have been acknowledged as tedious, costly, and risky. In this study, we propose Accelerated Deployment framework to safely and efficiently estimate the AVs performance on public streets. We showed that by appropriately addressing the gradual accuracy improvement and adaptively selecting meaningful and safe environment under which the AV is deployed, the proposed framework yield to highly accurate estimation with much faster evaluation time, and more importantly, lower deployment risk. Our findings provide an answer to the currently heated and active discussions on how to properly test AV performance on public roads so as to achieve safe, efficient, and statistically-reliable testing framework for AV technologies.


Selecting the best system, large deviations, and multi-armed bandits

arXiv.org Machine Learning

Consider the problem of finding a population amongst many with the largest mean when these means are unknown but population samples can be generated via simulation. Typically, by selecting a population with the largest sample mean, it can be shown that the false selection probability decays at an exponential rate. Lately researchers have sought algorithms that guarantee that this probability is restricted to a small $\delta$ in order $\log(1/\delta)$ computational time by estimating the associated large deviations rate function via simulation. We show that such guarantees are misleading. Enroute, we identify the large deviations principle followed by the empirically estimated large deviations rate function that may also be of independent interest. Further, we show a negative result that when populations have unbounded support, under mild restrictions, any policy that asymptotically identifies the correct population with probability at least $1-\delta$ for each problem instance requires more than $O(\log(1/\delta))$ samples in making such a determination in any problem instance. This suggests that some restrictions are essential on populations to devise $O(\log(1/\delta))$ algorithms with $1 - \delta$ correctness guarantees. We note that under restriction on population moments, such methods are easily designed. We also observe that sequential methods from stochastic multi-armed bandit literature can be adapted to devise such algorithms.