relative entropy
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- Europe > Poland > Lesser Poland Province > Kraków (0.05)
- (8 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > China (0.04)
- North America > United States > Massachusetts (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)
Empirical Risk Minimization with $f$-Divergence Regularization
Daunas, Francisco, Esnaola, Iñaki, Perlaza, Samir M., Poor, H. Vincent
In this paper, the solution to the empirical risk minimization problem with $f$-divergence regularization (ERM-$f$DR) is presented and conditions under which the solution also serves as the solution to the minimization of the expected empirical risk subject to an $f$-divergence constraint are established. The proposed approach extends applicability to a broader class of $f$-divergences than previously reported and yields theoretical results that recover previously known results. Additionally, the difference between the expected empirical risk of the ERM-$f$DR solution and that of its reference measure is characterized, providing insights into previously studied cases of $f$-divergences. A central contribution is the introduction of the normalization function, a mathematical object that is critical in both the dual formulation and practical computation of the ERM-$f$DR solution. This work presents an implicit characterization of the normalization function as a nonlinear ordinary differential equation (ODE), establishes its key properties, and subsequently leverages them to construct a numerical algorithm for approximating the normalization factor under mild assumptions. Further analysis demonstrates structural equivalences between ERM-$f$DR problems with different $f$-divergences via transformations of the empirical risk. Finally, the proposed algorithm is used to compute the training and test risks of ERM-$f$DR solutions under different $f$-divergence regularizers. This numerical example highlights the practical implications of choosing different functions $f$ in ERM-$f$DR problems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Austria > Vienna (0.14)
- (21 more...)
Optimal Anytime-Valid Tests for Composite Nulls
We consider the problem of designing optimal level-$α$ power-one tests for composite nulls. Given a parameter $α\in (0,1)$ and a stream of $\mathcal{X}$-valued observations $\{X_n: n \geq 1\} \overset{i.i.d.}{\sim} P$, the goal is to design a level-$α$ power-one test $τ_α$ for the null $H_0: P \in \mathcal{P}_0 \subset \mathcal{P}(\mathcal{X})$. Prior works have shown that any such $τ_α$ must satisfy $\mathbb{E}_P[τ_α] \geq \tfrac{\log(1/α)}{γ^*(P, \mathcal{P}_0)}$, where $γ^*(P, \mathcal{P}_0)$ is the so-called $\mathrm{KL}_{\inf}$ or minimum divergence of $P$ to the null class. In this paper, our objective is to develop and analyze constructive schemes that match this lower bound as $α\downarrow 0$. We first consider the finite-alphabet case~($|\mathcal{X}| = m < \infty$), and show that a test based on \emph{universal} $e$-process~(formed by the ratio of a universal predictor and the running null MLE) is optimal in the above sense. The proof relies on a Donsker-Varadhan~(DV) based saddle-point representation of $\mathrm{KL}_{\inf}$, and an application of Sion's minimax theorem. This characterization motivates a general method for arbitrary $\mathcal{X}$: construct an $e$-process based on the empirical solutions to the saddle-point representation over a sufficiently rich class of test functions. We give sufficient conditions for the optimality of this test for compact convex nulls, and verify them for Hölder smooth density models. We end the paper with a discussion on the computational aspects of implementing our proposed tests in some practical settings.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Performance Guarantees for Quantum Neural Estimation of Entropies
Sreekumar, Sreejith, Goldfeld, Ziv, Wilde, Mark M.
Estimating quantum entropies and divergences is an important problem in quantum physics, information theory, and machine learning. Quantum neural estimators (QNEs), which utilize a hybrid classical-quantum architecture, have recently emerged as an appealing computational framework for estimating these measures. Such estimators combine classical neural networks with parametrized quantum circuits, and their deployment typically entails tedious tuning of hyperparameters controlling the sample size, network architecture, and circuit topology. This work initiates the study of formal guarantees for QNEs of measured (Rényi) relative entropies in the form of non-asymptotic error risk bounds. We further establish exponential tail bounds showing that the error is sub-Gaussian, and thus sharply concentrates about the ground truth value. For an appropriate sub-class of density operator pairs on a space of dimension $d$ with bounded Thompson metric, our theory establishes a copy complexity of $O(|Θ(\mathcal{U})|d/ε^2)$ for QNE with a quantum circuit parameter set $Θ(\mathcal{U})$, which has minimax optimal dependence on the accuracy $ε$. Additionally, if the density operator pairs are permutation invariant, we improve the dimension dependence above to $O(|Θ(\mathcal{U})|\mathrm{polylog}(d)/ε^2)$. Our theory aims to facilitate principled implementation of QNEs for measured relative entropies and guide hyperparameter tuning in practice.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New Mexico (0.04)
- (5 more...)
Quantum Information Ordering and Differential Privacy
Dasgupta, Ayanava, Warsi, Naqueeb Ahmad, Hayashi, Masahito
We study quantum differential privacy (QDP) by defining a notion of the order of informativeness between two pairs of quantum states. In particular, we show that if the hypothesis testing divergence of the one pair dominates over that of the other pair, then this dominance holds for every f -divergence. This approach completely characterizes (ε,δ)-QDP mechanisms by identifying the most informative (ε,δ)-DP quantum state pairs. We apply this to analyze the stability of quantum differentially private learning algorithms, generalizing classical results to the case δ > 0. Additionally, we study precise limits for privatized hypothesis testing and privatized quantum parameter estimation, including tight upper-bounds on the quantum Fisher information under QDP . Finally, we establish near-optimal contraction bounds for differentially private quantum channels with respect to the hockey-stick divergence. I. Introduction A fundamental challenge in modern machine learning is the trade-off between privacy and information extraction. In this work, we explicitly treat both sides: privacy (ensuring that algorithmic outputs do not reveal significant information about the input data of the respondents) and the investigator's goal to extract as much useful information as possible from data for accurate learning and estimation. With the rapid advancement of machine learning, a key concern is about ensuring the privacy of learning algorithms, meaning that their outputs should not reveal significant information about the input data. Differential privacy (DP) provides a rigorous mathematical framework to balance these opposing requirements. Accordingly, we structure our contributions in three steps: first step (privacy), second step (information extraction under privacy constraints), and third step, the quantum channel setup, where the situation is more complicated, and we mark the transition to each step explicitly in the text. This step develops the privacy side of the trade-off from the respondent's perspective by studying the stability [1], [2] of learning algorithms. From the respondent's viewpoint, privacy means that the inclusion or exclusion of their individual data should not materially affect the mechanism's output, so that they can contribute data without fear of singled-out inference. An algorithm is considered stable if its output does not change drastically when a single respondent's data is changed; this point-wise insensitivity is precisely the respondent-centric guarantee we seek.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > Singapore (0.04)
- (3 more...)
Contraction and entropy production in continuous-time Sinkhorn dynamics
Srinivasan, Anand, Slotine, Jean-Jacques
Recently, the vanishing-step-size limit of the Sinkhorn algorithm at finite regularization parameter $\varepsilon$ was shown to be a mirror descent in the space of probability measures. We give $L^2$ contraction criteria in two time-dependent metrics induced by the mirror Hessian, which reduce to the coercivity of certain conditional expectation operators. We then give an exact identity for the entropy production rate of the Sinkhorn flow, which was previously known only to be nonpositive. Examining this rate shows that the standard semigroup analysis of diffusion processes extends systematically to the Sinkhorn flow. We show that the flow induces a reversible Markov dynamics on the target marginal as an Onsager gradient flow. We define the Dirichlet form associated to its (nonlocal) infinitesimal generator, prove a Poincaré inequality for it, and show that the spectral gap is strictly positive along the Sinkhorn flow whenever $\varepsilon > 0$. Lastly, we show that the entropy decay is exponential if and only if a logarithmic Sobolev inequality (LSI) holds. We give for illustration two immediate practical use-cases for the Sinkhorn LSI: as a design principle for the latent space in which generative models are trained, and as a stopping heuristic for discrete-time algorithms.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Compression with Bayesian Implicit Neural Representations
Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the β -ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting β . Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China (0.04)
- North America > United States > Massachusetts (0.04)
- Asia > Middle East > Israel (0.04)