Goto

Collaborating Authors

 density function


Mind the Gap: Mixtures of Gaussians in Approximate Differential Privacy

arXiv.org Machine Learning

We design a class of additive noise mechanisms that satisfy \((\varepsilon, ฮด)\)-differential privacy (DP) for scalar, real-valued query functions with known sensitivities, with a particular focus on moderate and low-privacy regimes. These mechanisms, which we call \textit{mixture mechanisms}, are constructed by mixing multiple Gaussian distributions that share the same variance but differ in their means and mixture weights. The resulting distributions can be interpreted as convex combinations of a zero-mean Gaussian (as used in the analytic Gaussian mechanism) and additional Gaussians whose means depend on the sensitivity of the query function. We derive tight conditions on the variances required for \((\varepsilon, ฮด)\)-DP and provide efficient algorithms to compute them. Compared to the analytic Gaussian mechanism, our mechanisms yield substantially lower expected noise amplitudes (\(l_1\)-loss) and variances (\(l_2\)-loss for zero-mean distributions). In the low-privacy regime that motivates our design, our mechanisms approach optimality, mitigating nearly all of the optimality gap of the analytic Gaussian mechanism.


Adaptive Kernel Density Estimation with Pre-training

arXiv.org Machine Learning

Density estimation in high-dimensional settings is an important and challenging statistical problem.Traditional methods based on kernel smoothing are inefficient in high dimensions due to the difficulties in specifying appropriate location-adaptive kernels. In this work, we introduce pre-training, a key idea behind many cutting-edge AI technologies, to the context of non-parametric density estimation. By establishing a pre-trained neural network that can recommend an appropriate location-adaptive kernel for each sample point, efficient density estimation with adaptive kernels is achieved in high dimensions. A wide range of numerical experiments show that this strategy is highly effective for improving density-estimation accuracy, when the target distribution is close to the distribution family for pre-training. When the target distribution is substantially different from the pre-training distribution family, the benefit from the proposed pre-training strategy may be diluted, but can be reactivated by an additional fine-tuning procedure.


Data-Efficient Non-Gaussian Semi-Nonparametric Density Estimation for Nonlinear Dynamical Systems

arXiv.org Machine Learning

Accurate representation of non-Gaussian distributions of quantities of interest in nonlinear dynamical systems is critical for estimation, control, and decision-making, but can be challenging when forward propagations are expensive to carry out. This paper presents an approach for estimating probability density functions of states evolving under nonlinear dynamics using Seminonparametric (SNP), or Gallant-Nychka, densities. SNP densities employ a probabilists' Hermite polynomial basis to model non-Gaussian behavior and are positive everywhere on the support by construction. We use Monte Carlo to approximate the expectation integrals that arise in the maximum likelihood estimation of SNP coefficients, and introduce a convex relaxation to generate effective initial estimates. The method is demonstrated on density and quantile estimation for the chaotic Lorenz system. The results demonstrate that the proposed method can accurately capture non-Gaussian density structure and compute quantiles using significantly fewer samples than raw Monte Carlo sampling.


Density Estimation via Discrepancy Based Adaptive Sequential Partition

Neural Information Processing Systems

Given iidobservations from an unknown absolute continuous distribution defined on some domain โ„ฆ, we propose a nonparametric method to learn a piecewise constant function to approximate the underlying probability density function. Our density estimate is a piecewise constant function defined on a binary partition of โ„ฆ. The key ingredient of the algorithm is to use discrepancy, a concept originates from Quasi Monte Carlo analysis, to control the partition process. The resulting algorithm is simple, efficient, and has a provable convergence rate. We empirically demonstrate its efficiency as a density estimation method. We also show how it can be utilized to find good initializations for k-means.




Processing of missing data by neural networks

Neural Information Processing Systems

Our idea is to replace typical neuron's response in the firsthiddenlayerbyitsexpected value. Thisapproach canbeappliedforvarious types ofnetworksatminimal costintheirmodification. Moreover,incontrast to recent approaches, it does not require complete data for training. Experimental results performed ondifferent types ofarchitectures showthatourmethod gives better results than typical imputation strategies and other methods dedicated for incompletedata.




e4a6222cdb5b34375400904f03d8e6a5-Paper.pdf

Neural Information Processing Systems

Inthiswork,wepropose sampling-argmax, adifferentiable training method that imposes implicit constraints tothe shape of the probability map by minimizing the expectation of the localization error.