Goto

Collaborating Authors

 tv-distance


On approximating the $f$-divergence between two Ising models

arXiv.org Artificial Intelligence

The $f$-divergence is a fundamental notion that measures the difference between two distributions. In this paper, we study the problem of approximating the $f$-divergence between two Ising models, which is a generalization of recent work on approximating the TV-distance. Given two Ising models $ν$ and $μ$, which are specified by their interaction matrices and external fields, the problem is to approximate the $f$-divergence $D_f(ν\,\|\,μ)$ within an arbitrary relative error $\mathrm{e}^{\pm \varepsilon}$. For $χ^α$-divergence with a constant integer $α$, we establish both algorithmic and hardness results. The algorithm works in a parameter regime that matches the hardness result. Our algorithm can be extended to other $f$-divergences such as $α$-divergence, Kullback-Leibler divergence, Rényi divergence, Jensen-Shannon divergence, and squared Hellinger distance.



Preconditioned Discrete-HAMS: A Second-order Irreversible Discrete Sampler

arXiv.org Machine Learning

Gradient-based Markov Chain Monte Carlo methods have recently received much attention for sampling discrete distributions, with notable examples such as Norm Constrained Gradient (NCG), Auxiliary Variable Gradient (AVG), and Discrete Hamiltonian Assisted Metropolis Sampling (DHAMS). In this work, we propose the Preconditioned Discrete-HAMS (PDHAMS) algorithm, which extends DHAMS by incorporating a second-order, quadratic approximation of the potential function, and uses Gaussian integral trick to avoid directly sampling a pairwise Markov random field. The PDHAMS sampler not only satisfies generalized detailed balance, hence enabling irreversible sampling, but also is a rejection-free property for a target distribution with a quadratic potential function. In various numerical experiments, PDHAMS algorithms consistently yield superior performance compared with other methods.


Approximating the Total Variation Distance between Gaussians

arXiv.org Artificial Intelligence

The total variation distance is a metric of central importance in statistics and probability theory. However, somewhat surprisingly, questions about computing it algorithmically appear not to have been systematically studied until very recently. In this paper, we contribute to this line of work by studying this question in the important special case of multivariate Gaussians. More formally, we consider the problem of approximating the total variation distance between two multivariate Gaussians to within an $\epsilon$-relative error. Previous works achieved a fixed constant relative error approximation via closed-form formulas. In this work, we give algorithms that given any two $n$-dimensional Gaussians $D_1,D_2$, and any error bound $\epsilon > 0$, approximate the total variation distance $D := d_{TV}(D_1,D_2)$ to $\epsilon$-relative accuracy in $\text{poly}(n,\frac{1}{\epsilon},\log \frac{1}{D})$ operations. The main technical tool in our work is a reduction that helps us extend the recent progress on computing the TV-distance between discrete random variables to our continuous setting.


Approximating the total variation distance between spin systems

arXiv.org Artificial Intelligence

Spin systems form an important class of undirected graphical models. For two Gibbs distributions $\mu$ and $\nu$ induced by two spin systems on the same graph $G = (V, E)$, we study the problem of approximating the total variation distance $d_{TV}(\mu,\nu)$ with an $\epsilon$-relative error. We propose a new reduction that connects the problem of approximating the TV-distance to sampling and approximate counting. Our applications include the hardcore model and the antiferromagnetic Ising model in the uniqueness regime, the ferromagnetic Ising model, and the general Ising model satisfying the spectral condition. Additionally, we explore the computational complexity of approximating the total variation distance $d_{TV}(\mu_S,\nu_S)$ between two marginal distributions on an arbitrary subset $S \subseteq V$. We prove that this problem remains hard even when both $\mu$ and $\nu$ admit polynomial-time sampling and approximate counting algorithms.


Locally Private Sampling with Public Data

arXiv.org Artificial Intelligence

Local differential privacy (LDP) is increasingly employed in privacy-preserving machine learning to protect user data before sharing it with an untrusted aggregator. Most LDP methods assume that users possess only a single data record, which is a significant limitation since users often gather extensive datasets (e.g., images, text, time-series data) and frequently have access to public datasets. To address this limitation, we propose a locally private sampling framework that leverages both the private and public datasets of each user. Specifically, we assume each user has two distributions: $p$ and $q$ that represent their private dataset and the public dataset, respectively. The objective is to design a mechanism that generates a private sample approximating $p$ while simultaneously preserving $q$. We frame this objective as a minimax optimization problem using $f$-divergence as the utility measure. We fully characterize the minimax optimal mechanisms for general $f$-divergences provided that $p$ and $q$ are discrete distributions. Remarkably, we demonstrate that this optimal mechanism is universal across all $f$-divergences. Experiments validate the effectiveness of our minimax optimal sampler compared to the state-of-the-art locally private sampler.


Learning Mixtures of Markov Chains with Quality Guarantees

arXiv.org Artificial Intelligence

A large number of modern applications ranging from listening songs online and browsing the Web to using a navigation app on a smartphone generate a plethora of user trails. Clustering such trails into groups with a common sequence pattern can reveal significant structure in human behavior that can lead to improving user experience through better recommendations, and even prevent suicides [LMCR14]. One approach to modeling this problem mathematically is as a mixture of Markov chains. Recently, Gupta, Kumar and Vassilvitski [GKV16] introduced an algorithm (GKV-SVD) based on the singular value decomposition (SVD) that under certain conditions can perfectly recover a mixture of L chains on n states, given only the distribution of trails of length 3 (3-trail). In this work we contribute to the problem of unmixing Markov chains by highlighting and addressing two important constraints of the GKV-SVD algorithm [GKV16]: some chains in the mixture may not even be weakly connected, and secondly in practice one does not know beforehand the true number of chains. We resolve these issues in the Gupta et al. paper [GKV16]. Specifically, we propose an algebraic criterion that enables us to choose a value of L efficiently that avoids overfitting. Furthermore, we design a reconstruction algorithm that outputs the true mixture in the presence of disconnected chains and is robust to noise. We complement our theoretical results with experiments on both synthetic and real data, where we observe that our method outperforms the GKV-SVD algorithm. Finally, we empirically observe that combining an EM-algorithm with our method performs best in practice, both in terms of reconstruction error with respect to the distribution of 3-trails and the mixture of Markov Chains.