Goto

Collaborating Authors

 guntuboyina


Score-Based Diffusion Modeling for Nonparametric Empirical Bayes in Heteroscedastic Gaussian Mixtures

Neural Information Processing Systems

We propose a generalized score-based diffusion framework for learning multivariate Gaussian mixture models with homoscedastic or heteroscedastic noise. Our goal is to nonparametrically estimate the latent location distribution and denoise the observations.


Sharp Inequalities between Total Variation and Hellinger Distances for Gaussian Mixtures

arXiv.org Machine Learning

Sharp Inequalities between Total Variation and Hellinger Distances for Gaussian Mixtures Joonhyuk Jung 1 and Chao Gao 1 1 Department of Statistics, University of Chicago Abstract We study the relation between the total variation (TV) and Hellinger distances between two Gaussian location mixtures. Our first result establishes a general upper bound: for any two mixing distributions supported on a compact set, the Hellinger distance between the two mixtures is controlled by the TV distance raised to a power 1 o(1), where the o(1) term is of order 1/log log(1/TV). We also construct two sequences of mixing distributions that demonstrate the sharpness of this bound. Taken together, our results resolve an open problem raised in Jia et al. (2023) and thus lead to an entropic characterization of learning Gaussian mixtures in total variation. Our inequality also yields optimal robust estimation of Gaussian mixtures in Hellinger distance, which has a direct implication for bounding the minimax regret of empirical Bayes under Huber contamination. 1 Introduction The Gaussian location mixture is one of the most fundamental models used in nonparametric density estimation, Bayesian inference, and clustering (Lindsay, 1995; Dasgupta, 1999). Given a probability measure ฯ€ supported on R d, the induced Gaussian mixture is defined by f ฯ€(x) = null R dฯ• d(x ฮธ)dฯ€(ฮธ), where ฯ• d(x) = (2ฯ€) d/2 exp( x 2 2/2) is the density function of the d-dimensional standard Gaussian distribution. In this paper, we study the relation between the total variation distance TV(p,q):= 1 2 null |p q| and the Hellinger distance H(p,q):= null 1 2 null ( p q) 2 of two Gaussian mixture densities.


MARS via LASSO

arXiv.org Machine Learning

MARS is a popular method for nonparametric regression introduced by Friedman in 1991. MARS fits simple nonlinear and non-additive functions to regression data. We propose and study a natural LASSO variant of the MARS method. Our method is based on least squares estimation over a convex class of functions obtained by considering infinite-dimensional linear combinations of functions in the MARS basis and imposing a variation based complexity constraint. We show that our estimator can be computed via finite-dimensional convex optimization and that it is naturally connected to nonparametric function estimation techniques based on smoothness constraints. Under a simple design assumption, we prove that our estimator achieves a rate of convergence that depends only logarithmically on dimension and thus avoids the usual curse of dimensionality to some extent. We implement our method with a cross-validation scheme for the selection of the involved tuning parameter and show that it has favorable performance compared to the usual MARS method in simulation and real data settings.


Convex Regression in Multidimensions: Suboptimality of Least Squares Estimators

arXiv.org Machine Learning

The least squares estimator (LSE) is shown to be suboptimal in squared error loss in the usual nonparametric regression model with Gaussian errors for $d \geq 5$ for each of the following families of functions: (i) convex functions supported on a polytope (in fixed design), (ii) bounded convex functions supported on a polytope (in random design), and (iii) convex Lipschitz functions supported on any convex domain (in random design). For each of these families, the risk of the LSE is proved to be of the order $n^{-2/d}$ (up to logarithmic factors) while the minimax risk is $n^{-4/(d+4)}$, for $d \ge 5$. In addition, the first rate of convergence results (worst case and adaptive) for the full convex LSE are established for polytopal domains for all $d \geq 1$. Some new metric entropy results for convex functions are also proved which are of independent interest.


Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation

arXiv.org Machine Learning

We consider the problem of nonparametric regression when the covariate is $d$-dimensional, where $d \geq 1$. In this paper we introduce and study two nonparametric least squares estimators (LSEs) in this setting---the entirely monotonic LSE and the constrained Hardy-Krause variation LSE. We show that these two LSEs are natural generalizations of univariate isotonic regression and univariate total variation denoising, respectively, to multiple dimensions. We discuss the characterization and computation of these two LSEs obtained from $n$ data points. We provide a detailed study of their risk properties under the squared error loss and fixed uniform lattice design. We show that the finite sample risk of these LSEs is always bounded from above by $n^{-2/3}$ modulo logarithmic factors depending on $d$; thus these nonparametric LSEs avoid the curse of dimensionality to some extent. For the case of the Hardy-Krause variation LSE, we also show that logarithmic factors which increase with $d$ are necessary in the risk upper bound by proving a minimax lower bound. Further, we illustrate that these LSEs are particularly useful in fitting rectangular piecewise constant functions. Specifically, we show that the risk of the entirely monotonic LSE is almost parametric (at most $1/n$ up to logarithmic factors) when the true function is well-approximable by a rectangular piecewise constant entirely monotone function with not too many constant pieces. A similar result is also shown to hold for the constrained Hardy-Krause variation LSE for a simple subclass of rectangular piecewise constant functions. We believe that the proposed LSEs yield a novel approach to estimating multivariate functions using convex optimization that avoid the curse of dimensionality to some extent.