Goto

Collaborating Authors

 density estimation


Communication-Efficient Distributed Learning of Discrete Distributions

Neural Information Processing Systems

We initiate a systematic investigation of distribution learning (density estimation) when the data is distributed across multiple servers. The servers must communicate with a referee and the goal is to estimate the underlying distribution with as few bits of communication as possible. We focus on non-parametric density estimation of discrete distributions with respect to the l1 and l2 norms. We provide the first non-trivial upper and lower bounds on the communication complexity of this basic estimation task in various settings of interest. Specifically, our results include the following: 1. When the unknown discrete distribution is unstructured and each server has only one sample, we show that any blackboard protocol (i.e., any protocol in which servers interact arbitrarily using public messages) that learns the distribution must essentially communicate the entire sample.


Scalable Uncertainty Quantification for Black-Box Density-Based Clustering

Bariletto, Nicola, Walker, Stephen G.

arXiv.org Machine Learning

We introduce a novel framework for uncertainty quantification in clustering. By combining the martingale posterior paradigm with density-based clustering, uncertainty in the estimated density is naturally propagated to the clustering structure. The approach scales effectively to high-dimensional and irregularly shaped data by leveraging modern neural density estimators and GPU-friendly parallel computation. We establish frequen-tist consistency guarantees and validate the methodology on synthetic and real data.







Instance-Optimal Private Density Estimation in the Wasserstein Distance

Neural Information Processing Systems

Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances.