Goto

Collaborating Authors

 Alajaji, Fady


Classification Utility, Fairness, and Compactness via Tunable Information Bottleneck and R\'enyi Measures

arXiv.org Artificial Intelligence

Designing machine learning algorithms that are accurate yet fair, not discriminating based on any sensitive attribute, is of paramount importance for society to accept AI for critical applications. In this article, we propose a novel fair representation learning method termed the R\'enyi Fair Information Bottleneck Method (RFIB) which incorporates constraints for utility, fairness, and compactness (compression) of representation, and apply it to image and tabular data classification. A key attribute of our approach is that we consider - in contrast to most prior work - both demographic parity and equalized odds as fairness constraints, allowing for a more nuanced satisfaction of both criteria. Leveraging a variational approach, we show that our objectives yield a loss function involving classical Information Bottleneck (IB) measures and establish an upper bound in terms of two R\'enyi measures of order $\alpha$ on the mutual information IB term measuring compactness between the input and its encoded embedding. We study the influence of the $\alpha$ parameter as well as two other tunable IB parameters on achieving utility/fairness trade-off goals, and show that the $\alpha$ parameter gives an additional degree of freedom that can be used to control the compactness of the representation. Experimenting on three different image datasets (EyePACS, CelebA, and FairFace) and two tabular datasets (Adult and COMPAS), using both binary and categorical sensitive attributes, we show that on various utility, fairness, and compound utility/fairness metrics RFIB outperforms current state-of-the-art approaches.


A Unifying Generator Loss Function for Generative Adversarial Networks

arXiv.org Artificial Intelligence

A unifying $\alpha$-parametrized generator loss function is introduced for a dual-objective generative adversarial network (GAN), which uses a canonical (or classical) discriminator loss function such as the one in the original GAN (VanillaGAN) system. The generator loss function is based on a symmetric class probability estimation type function, $\mathcal{L}_\alpha$, and the resulting GAN system is termed $\mathcal{L}_\alpha$-GAN. Under an optimal discriminator, it is shown that the generator's optimization problem consists of minimizing a Jensen-$f_\alpha$-divergence, a natural generalization of the Jensen-Shannon divergence, where $f_\alpha$ is a convex function expressed in terms of the loss function $\mathcal{L}_\alpha$. It is also demonstrated that this $\mathcal{L}_\alpha$-GAN problem recovers as special cases a number of GAN problems in the literature, including VanillaGAN, Least Squares GAN (LSGAN), Least $k$th order GAN (L$k$GAN) and the recently introduced $(\alpha_D,\alpha_G)$-GAN with $\alpha_D=1$. Finally, experimental results are conducted on three datasets, MNIST, CIFAR-10, and Stacked MNIST to illustrate the performance of various examples of the $\mathcal{L}_\alpha$-GAN system.


R\'{e}nyi Generative Adversarial Networks

arXiv.org Machine Learning

Unsupervised learning is the problem of educing information from a large unlabeled dataset and, in the context of generative models, is a relatively new area that has received much attention. Two prominent objectives in generative modeling consist of determining the underlying probability distribution function of a dataset or generating data that mimics it. Classical techniques for the former include maximum likelihood estimators, methods of moments estimators and Bayesian estimators. The main approaches for the latter include generative adversarial networks (GANs) [15], [5], [36], [10], autoencoders/variational autoencoders (VAEs) [22], generative autoregressive models [34], invertible flow based latent vector models [23], or hybrids of the above models [16]. Compared to other approaches, GANs have garnered the most interest (e.g., see surveys in [10], [43], [44]); unlike density estimation models, GANs can efficiently represent distributions confined to a low dimensional manifold [5] and are therefore the focus of this paper. Prior Work: The original GAN [15] consists of a generative neural network competing with a discriminatory neural network in a min-max game. GANs were enhanced with the introduction of deep convolutional GANs (DCGANs) [36] which use convolutional layers to learn higher dimensional dependencies that are inherent in complex datasets such as images [36].


Information Extraction Under Privacy Constraints

arXiv.org Machine Learning

A privacy-constrained information extraction problem is considered where for a pair of correlated discrete random variables $(X,Y)$ governed by a given joint distribution, an agent observes $Y$ and wants to convey to a potentially public user as much information about $Y$ as possible without compromising the amount of information revealed about $X$. To this end, the so-called {\em rate-privacy function} is introduced to quantify the maximal amount of information (measured in terms of mutual information) that can be extracted from $Y$ under a privacy constraint between $X$ and the extracted information, where privacy is measured using either mutual information or maximal correlation. Properties of the rate-privacy function are analyzed and information-theoretic and estimation-theoretic interpretations of it are presented for both the mutual information and maximal correlation privacy measures. It is also shown that the rate-privacy function admits a closed-form expression for a large family of joint distributions of $(X,Y)$. Finally, the rate-privacy function under the mutual information privacy measure is considered for the case where $(X,Y)$ has a joint probability density function by studying the problem where the extracted information is a uniform quantization of $Y$ corrupted by additive Gaussian noise. The asymptotic behavior of the rate-privacy function is studied as the quantization resolution grows without bound and it is observed that not all of the properties of the rate-privacy function carry over from the discrete to the continuous case.