variational form
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
Distributional Evaluation of Generative Models via Relative Density Ratio
We propose a function-valued evaluation metric for generative models based on the relative density ratio (RDR) designed to characterize distributional differences between real and generated samples. As an evaluation metric, the RDR function preserves $ϕ$-divergence between two distributions, enables sample-level evaluation that facilitates downstream investigations of feature-specific distributional differences, and has a bounded range that affords clear interpretability and numerical stability. Function estimation of the RDR is achieved efficiently through optimization on the variational form of $ϕ$-divergence. We provide theoretical convergence rate guarantees for general estimators based on M-estimator theory, as well as the convergence rate of neural network-based estimators when the true ratio is in the anisotropic Besov space. We demonstrate the power of the proposed RDR-based evaluation through numerical experiments on MNIST, CelebA64, and the American Gut project microbiome data. We show that the estimated RDR enables not only effective overall comparison of competing generative models, but also a convenient way to reveal the underlying nature of goodness-of-fit. This enables one to assess support overlap, coverage, and fidelity while pinpointing regions of the sample space where generators concentrate and revealing the features that drive the most salient distributional differences.
- North America > United States > Texas (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.90)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
Learning Energy-based Variational Latent Prior for VAEs
Dutta, Debottam, Amballa, Chaitanya, Xu, Zhongweiyang, Wei, Yu-Lin, Choudhury, Romit Roy
Variational Auto-Encoders (VAEs) are known to generate blurry and inconsistent samples. One reason for this is the "prior hole" problem. A prior hole refers to regions that have high probability under the VAE's prior but low probability under the VAE's posterior. This means that during data generation, high probability samples from the prior could have low probability under the posterior, resulting in poor quality data. Ideally, a prior needs to be flexible enough to match the posterior while retaining the ability to generate samples fast. Generative models continue to address this tradeoff. This paper proposes to model the prior as an energy-based model (EBM). While EBMs are known to offer the flexibility to match posteriors (and also improving the ELBO), they are traditionally slow in sample generation due to their dependency on MCMC methods. Our key idea is to bring a variational approach to tackle the normalization constant in EBMs, thus bypassing the expensive MCMC approaches. The variational form can be approximated with a sampler network, and we show that such an approach to training priors can be formulated as an alternating optimization problem. Moreover, the same sampler reduces to an implicit variational prior during generation, providing efficient and fast sampling. We compare our Energy-based Variational Latent Prior (EVaLP) method to multiple SOTA baselines and show improvements in image generation quality, reduced prior holes, and better sampling efficiency.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (7 more...)
Two-sample comparison through additive tree models for density ratios
Awaya, Naoki, Xu, Yuliang, Ma, Li
The ratio of two densities characterizes their differences. We consider learning the density ratio given i.i.d. observations from each of the two distributions. We propose additive tree models for the density ratio along with efficient algorithms for training these models using a new loss function called the balancing loss. With this loss, additive tree models for the density ratio can be trained using algorithms original designed for supervised learning. Specifically, they can be trained from both an optimization perspective that parallels tree boosting and from a (generalized) Bayesian perspective that parallels Bayesian additive regression trees (BART). For the former, we present two boosting algorithms -- one based on forward-stagewise fitting and the other based on gradient boosting, both of which produce a point estimate for the density ratio function. For the latter, we show that due to the loss function's resemblance to an exponential family kernel, the new loss can serve as a pseudo-likelihood for which conjugate priors exist, thereby enabling effective generalized Bayesian inference on the density ratio using backfitting samplers designed for BART. The resulting uncertainty quantification on the inferred density ratio is critical for applications involving high-dimensional and complex distributions in which uncertainty given limited data can often be substantial. We provide insights on the balancing loss through its close connection to the exponential loss in binary classification and to the variational form of f-divergence, in particular that of the squared Hellinger distance. Our numerical experiments demonstrate the accuracy of the proposed approach while providing unique capabilities in uncertainty quantification. We demonstrate the application of our method in a case study involving assessing the quality of generative models for microbiome compositional data.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Multiple Wasserstein Gradient Descent Algorithm for Multi-Objective Distributional Optimization
Nguyen, Dai Hai, Mamitsuka, Hiroshi, Nakamura, Atsuyoshi
We address the optimization problem of simultaneously minimizing multiple objective functionals over a family of probability distributions. This type of Multi-Objective Distributional Optimization commonly arises in machine learning and statistics, with applications in areas such as multiple target sampling, multi-task learning, and multi-objective generative modeling. To solve this problem, we propose an iterative particle-based algorithm, which we call Muliple Wasserstein Gradient Descent (MWGraD), which constructs a flow of intermediate empirical distributions, each being represented by a set of particles, which gradually minimize the multiple objective functionals simultaneously. Specifically, MWGraD consists of two key steps at each iteration. First, it estimates the Wasserstein gradient for each objective functional based on the current particles. Then, it aggregates these gradients into a single Wasserstein gradient using dynamically adjusted weights and updates the particles accordingly. In addition, we provide theoretical analysis and present experimental results on both synthetic and real-world datasets, demonstrating the effectiveness of MWGraD.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)
Generalization Bounds for Quantum Learning via Rényi Divergences
Warsi, Naqueeb Ahmad, Dasgupta, Ayanava, Hayashi, Masahito
This work advances the theoretical understanding of quantum learning by establishing a new family of upper bounds on the expected generalization error of quantum learning algorithms, leveraging the framework introduced by Caro et al. (2024) and a new definition for the expected true loss. Our primary contribution is the derivation of these bounds in terms of quantum and classical Rényi divergences, utilizing a variational approach for evaluating quantum Rényi divergences, specifically the Petz and a newly introduced modified sandwich quantum Rényi divergence. Analytically and numerically, we demonstrate the superior performance of the bounds derived using the modified sandwich quantum Rényi divergence compared to those based on the Petz divergence. Furthermore, we provide probabilistic generalization error bounds using two distinct techniques: one based on the modified sandwich quantum Rényi divergence and classical Rényi divergence, and another employing smooth max Rényi divergence.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States (0.04)
- (5 more...)
Design Amortization for Bayesian Optimal Experimental Design
Kennamer, Noble, Walton, Steven, Ihler, Alexander
Bayesian optimal experimental design is a sub-field of statistics focused on developing methods to make efficient use of experimental resources. Any potential design is evaluated in terms of a utility function, such as the (theoretically well-justified) expected information gain (EIG); unfortunately however, under most circumstances the EIG is intractable to evaluate. In this work we build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the EIG. Past work focused on learning a new variational model from scratch for each new design considered. Here we present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs. To further improve computational efficiency, we also propose to train the variational model on a significantly cheaper-to-evaluate lower bound, and show empirically that the resulting model provides an excellent guide for more accurate, but expensive to evaluate bounds on the EIG. We demonstrate the effectiveness of our technique on generalized linear models, a class of statistical models that is widely used in the analysis of controlled experiments. Experiments show that our method is able to greatly improve accuracy over existing approximation strategies, and achieve these results with far better sample efficiency.
- North America > United States > Oregon (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- Europe > France (0.04)