Not enough data to create a plot.
Try a different view from the menu above.
Bhattacharyya, Arnab
Approximating the Total Variation Distance between Gaussians
Bhattacharyya, Arnab, Feng, Weiming, Srivastava, Piyush
The total variation distance is a metric of central importance in statistics and probability theory. However, somewhat surprisingly, questions about computing it algorithmically appear not to have been systematically studied until very recently. In this paper, we contribute to this line of work by studying this question in the important special case of multivariate Gaussians. More formally, we consider the problem of approximating the total variation distance between two multivariate Gaussians to within an $\epsilon$-relative error. Previous works achieved a fixed constant relative error approximation via closed-form formulas. In this work, we give algorithms that given any two $n$-dimensional Gaussians $D_1,D_2$, and any error bound $\epsilon > 0$, approximate the total variation distance $D := d_{TV}(D_1,D_2)$ to $\epsilon$-relative accuracy in $\text{poly}(n,\frac{1}{\epsilon},\log \frac{1}{D})$ operations. The main technical tool in our work is a reduction that helps us extend the recent progress on computing the TV-distance between discrete random variables to our continuous setting.
Learning multivariate Gaussians with imperfect advice
Bhattacharyya, Arnab, Choo, Davin, John, Philips George, Gouleakis, Themis
The problem of approximating an underlying distribution from its observed samples is a fundamental scientific problem. The distribution learning problem has been studied for more than a century in statistics, and it is the underlying engine for much of applied machine learning. The emphasis in modern applications is on highdimensional distributions, with the goal being to understand when one can escape the curse of dimensionality. The survey by [Dia16] gives an excellent overview of classical and modern techniques for distribution learning, especially when there is some underlying structure to be exploited. In this work, we investigate how to go beyond worst case sample complexities for learning distributions. We consider the situation where the algorithm is also given the aid of possibly imperfect advice regarding the input distribution. We position our study in the context of algorithms with predictions, where the usual problem input is supplemented by "predictions" or "advice" (potentially drawn from modern machine learning models) and the algorithm's goal is to incorporate the advice in a way that improves performance if the advice is of high quality, but if the advice is inaccurate, there should not be degradation below the performance in the no-advice setting. Most previous work in this setting are in the context of online algorithms, e.g. for the ski-rental problem [GP19, WLW20, ADJ
Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs
John, Philips George, Bhattacharyya, Arnab, Maniu, Silviu, Myrisiotis, Dimitrios, Wu, Zhenan
Reinforcement learning algorithms are usually stated without theoretical guarantees regarding their performance. Recently, Jin, Yang, Wang, and Jordan (COLT 2020) showed a polynomial-time reinforcement learning algorithm (namely, LSVI-UCB) for the setting of linear Markov decision processes, and provided theoretical guarantees regarding its running time and regret. In real-world scenarios, however, the space usage of this algorithm can be prohibitive due to a utilized linear regression step. We propose and analyze two modifications of LSVI-UCB, which alternate periods of learning and not-learning, to reduce space and time usage while maintaining sublinear regret. We show experimentally, on synthetic data and real-world benchmarks, that our algorithms achieve low space usage and running time, while not significantly sacrificing regret.
Probably approximately correct high-dimensional causal effect estimation given a valid adjustment set
Choo, Davin, Squires, Chandler, Bhattacharyya, Arnab, Sontag, David
Accurate estimates of causal effects play a key role in decision-making across applications such as healthcare, economics, and operations. In the absence of randomized experiments, a common approach to estimating causal effects uses \textit{covariate adjustment}. In this paper, we study covariate adjustment for discrete distributions from the PAC learning perspective, assuming knowledge of a valid adjustment set $\bZ$, which might be high-dimensional. Our first main result PAC-bounds the estimation error of covariate adjustment by a term that is exponential in the size of the adjustment set; it is known that such a dependency is unavoidable even if one only aims to minimize the mean squared error. Motivated by this result, we introduce the notion of an \emph{$\eps$-Markov blanket}, give bounds on the misspecification error of using such a set for covariate adjustment, and provide an algorithm for $\eps$-Markov blanket discovery; our second main result upper bounds the sample complexity of this algorithm. Furthermore, we provide a misspecification error bound and a constraint-based algorithm that allow us to go beyond $\eps$-Markov blankets to even smaller adjustment sets. Our third main result upper bounds the sample complexity of this algorithm, and our final result combines the first three into an overall PAC bound. Altogether, our results highlight that one does not need to perfectly recover causal structure in order to ensure accurate estimates of causal effects.
Online bipartite matching with imperfect advice
Choo, Davin, Gouleakis, Themis, Ling, Chun Kai, Bhattacharyya, Arnab
We study the problem of online unweighted bipartite matching with $n$ offline vertices and $n$ online vertices where one wishes to be competitive against the optimal offline algorithm. While the classic RANKING algorithm of Karp et al. [1990] provably attains competitive ratio of $1-1/e > 1/2$, we show that no learning-augmented method can be both 1-consistent and strictly better than $1/2$-robust under the adversarial arrival model. Meanwhile, under the random arrival model, we show how one can utilize methods from distribution testing to design an algorithm that takes in external advice about the online vertices and provably achieves competitive ratio interpolating between any ratio attainable by advice-free methods and the optimal ratio of 1, depending on the advice quality.
Distribution Learning Meets Graph Structure Sampling
Bhattacharyya, Arnab, Gayen, Sutanu, John, Philips George, Sen, Sayantan, Vinodchandran, N. V.
This work establishes a novel link between the problem of PAC-learning high-dimensional graphical models and the task of (efficient) counting and sampling of graph structures, using an online learning framework. We observe that if we apply the exponentially weighted average (EWA) or randomized weighted majority (RWM) forecasters on a sequence of samples from a distribution P using the log loss function, the average regret incurred by the forecaster's predictions can be used to bound the expected KL divergence between P and the predictions. Known regret bounds for EWA and RWM then yield new sample complexity bounds for learning Bayes nets. Moreover, these algorithms can be made computationally efficient for several interesting classes of Bayes nets. Specifically, we give a new sample-optimal and polynomial time learning algorithm with respect to trees of unknown structure and the first polynomial sample and time algorithm for learning with respect to Bayes nets over a given chordal skeleton.
Outlier Robust Multivariate Polynomial Regression
Arora, Vipul, Bhattacharyya, Arnab, Boban, Mathews, Guruswami, Venkatesan, Kelman, Esty
We study the problem of robust multivariate polynomial regression: let $p\colon\mathbb{R}^n\to\mathbb{R}$ be an unknown $n$-variate polynomial of degree at most $d$ in each variable. We are given as input a set of random samples $(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R}$ that are noisy versions of $(\mathbf{x}_i,p(\mathbf{x}_i))$. More precisely, each $\mathbf{x}_i$ is sampled independently from some distribution $\chi$ on $[-1,1]^n$, and for each $i$ independently, $y_i$ is arbitrary (i.e., an outlier) with probability at most $\rho < 1/2$, and otherwise satisfies $|y_i-p(\mathbf{x}_i)|\leq\sigma$. The goal is to output a polynomial $\hat{p}$, of degree at most $d$ in each variable, within an $\ell_\infty$-distance of at most $O(\sigma)$ from $p$. Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$. We generalize their results to the $n$-variate setting, showing an algorithm that achieves a sample complexity of $O_n(d^n\log d)$, where the hidden constant depends on $n$, if $\chi$ is the $n$-dimensional Chebyshev distribution. The sample complexity is $O_n(d^{2n}\log d)$, if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most $O(\sigma)$, and the run-time depends on $\log(1/\sigma)$. In the setting where each $\mathbf{x}_i$ and $y_i$ are known up to $N$ bits of precision, the run-time's dependence on $N$ is linear. We also show that our sample complexities are optimal in terms of $d^n$. Furthermore, we show that it is possible to have the run-time be independent of $1/\sigma$, at the cost of a higher sample complexity.
Optimal estimation of Gaussian (poly)trees
Wang, Yuhao, Gao, Ming, Tai, Wai Ming, Aragam, Bryon, Bhattacharyya, Arnab
We develop optimal algorithms for learning undirected Gaussian trees and directed Gaussian polytrees from data. We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery). The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently. The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning. We derive explicit finite-sample guarantees for both approaches, and show that both approaches are optimal by deriving matching lower bounds. Additionally, we conduct numerical experiments to compare the performance of various algorithms, providing further insights and empirical evidence.
Learning bounded-degree polytrees with known skeleton
Choo, Davin, Yang, Joy Qiping, Bhattacharyya, Arnab, Canonne, Clément L.
We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight.
Total Variation Distance Estimation Is as Easy as Probabilistic Inference
Bhattacharyya, Arnab, Gayen, Sutanu, Meel, Kuldeep S., Myrisiotis, Dimitrios, Pavan, A., Vinodchandran, N. V.
Machine learning and data science heavily rely on probability distributions that are widely used to capture dependencies among large number of variables. Such high-dimensional distributions naturally appear in various domains including neuroscience [ROL02, CTY06], bioinformatics [BB01], text and image processing [Mur22], and causal inference [Pea09]. Substantial research has been devoted to developing models that represent high-dimensional probability distributions succinctly. One prevalent approach is through graphical models. In a graphical model, a graph describes the conditional dependencies among variables and the probability distribution is factorized according to the adjacency relationships in the graph [KF09]. When the underlying graph is a directed graph, the model is known as a Bayesian network or Bayes net. Two fundamental computational tasks on distributions are distance computation and probabilistic inference. In this work, we establish a novel connection between these two seemingly different computational tasks.