Country
Fairness With Minimal Harm: A Pareto-Optimal Approach For Healthcare
Martinez, Natalia, Bertran, Martin, Sapiro, Guillermo
Common fairness definitions in machine learning focus on balancing notions of disparity and utility. In this work, we study fairness in the context of risk disparity among sub-populations. We are interested in learning models that minimize performance discrepancies across sensitive groups without causing unnecessary harm. This is relevant to high-stakes domains such as healthcare, where non-maleficence is a core principle. We formalize this objective using Pareto frontiers, and provide analysis, based on recent works in fairness, to exemplify scenarios were perfect fairness might not be feasible without doing unnecessary harm. We present a methodology for training neural networks that achieve our goal by dynamically re-balancing subgroups risks. We argue that even in domains where fairness at cost is required, finding a non-unnecessary-harm fairness model is the optimal initial step. We demonstrate this methodology on real case-studies of predicting ICU patient mortality, and classifying skin lesions from dermatoscopic images.
Generalized Maximum Causal Entropy for Inverse Reinforcement Learning
Mai, Tien, Chan, Kennard, Jaillet, Patrick
We consider the problem of learning from demonstrated trajectories with inverse reinforcement learning (IRL). Motivated by a limitation of the classical maximum entropy model (Ziebart, Bagnell, and Dey 2010) in capturing the structure of the network of states, we propose an IRL model based on a generalized version of the causal entropy maximization problem, which allows us to generate a class of maximum entropy IRL models. Our generalized model has an advantage of being able to recover, in addition to a reward function, another expert's function that would (partially) capture the impact of the connecting structure of the states on experts' decisions. Empirical evaluation on a real-world dataset and a grid-world dataset shows that our generalized model outperforms the classical ones, in terms of recovering reward functions and demonstrated trajectories.
Query Complexity of Bayesian Private Learning
We study the query complexity of Bayesian Private Learning: a learner wishes to locate a random target within an interval by submitting queries, in the presence of an adversary who observes all of her queries but not the responses. How many queries are necessary and sufficient in order for the learner to accurately estimate the target, while simultaneously concealing the target from the adversary? Our main result is a query complexity lower bound that is tight up to the first order. We show that if the learner wants to estimate the target within an error of $\varepsilon$, while ensuring that no adversary estimator can achieve a constant additive error with probability greater than $1/L$, then the query complexity is on the order of $L\log(1/\varepsilon)$, as $\varepsilon \to 0$. Our result demonstrates that increased privacy, as captured by $L$, comes at the expense of a {multiplicative} increase in query complexity. Our proof method builds on Fano's inequality and a family of proportional-sampling estimators. As an illustration of the method's wider applicability, we generalize the complexity lower bound to settings involving high-dimensional linear query learning and partial adversary observation.
Dynamic Modeling and Equilibria in Fair Decision Making
Williams, Joshua, Kolter, J. Zico
Recent studies on fairness in automated decision making systems have both investigated the potential future impact of these decisions on the population at large, and emphasized that imposing ''typical'' fairness constraints such as demographic parity or equality of opportunity does not guarantee a benefit to disadvantaged groups. However, these previous studies have focused on either simple one-step cost/benefit criteria, or on discrete underlying state spaces. In this work, we first propose a natural continuous representation of population state, governed by the Beta distribution, using a loan granting setting as a running example. Next, we apply a model of population dynamics under lending decisions, and show that when conditional payback probabilities are estimated correctly 1) ``optimal'' behavior by lenders can lead to ''Matthew Effect'' bifurcations (i.e., ''the rich get richer and the poor get poorer''), but that 2) many common fairness constraints on the allowable policies cause groups to converge to the same equilibrium point. Last, we contrast our results in the case of misspecified conditional probability estimates with prior work, and show that for this model, different levels of group misestimation guarantees that even fair policies lead to bifurcations. We illustrate some of the modeling conclusions on real data from credit scoring.
TinyCNN: A Tiny Modular CNN Accelerator for Embedded FPGA
In recent years, Convolutional Neural Network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods are computational-intensive and resource-consuming, and thus are hard to be integrated into embedded systems such as smart phones, smart glasses, and robots. FPGA is one of the most promising platforms for accelerating CNN, but the limited on-chip memory size limit the performance of FPGA accelerator for CNN. In this paper, we propose a framework for designing CNN accelerator on embedded FPGA for image classification. The proposed framework provides a tool for FPGA resource-aware design space exploration of CNNs and automatically generates the hardware description of the CNN to be programmed on a target FPGA. The framework consists of three main backends; software, hardware generation, and simulation/precision adjustment. The software backend serves as an API to the designer to design the CNN and train it according to the hardware resources that are available. Using the CNN model, hardware backend generates the necessary hardware components and integrates them to generate the hardware description of the CNN. Finaly, Simulation/precision adjustment backend adjusts the inter-layer precision units to minimize the classification error. We used 16-bit fixed-point data in a CNN accelerator (FPGA) and compared it to the exactly similar software version running on an ARM processor (32-bit floating point data). We encounter about 3% accuracy loss in classification of the accelerated (FPGA) version. In return, we got up to 15.75x speedup by classifying with the accelerated version on the FPGA.
Unsupervised Attributed Multiplex Network Embedding
Park, Chanyoung, Kim, Donghyun, Han, Jiawei, Yu, Hwanjo
Nodes in a multiplex network are connected by multiple types of relations. However, most existing network embedding methods assume that only a single type of relation exists between nodes. Even for those that consider the multiplexity of a network, they overlook node attributes, resort to node labels for training, and fail to model the global properties of a graph. We present a simple yet effective unsupervised network embedding method for attributed multiplex network called DMGI, inspired by Deep Graph Infomax (DGI) that maximizes the mutual information between local patches of a graph, and the global representation of the entire graph. We devise a systematic way to jointly integrate the node embeddings from multiple graphs by introducing 1) the consensus regularization framework that minimizes the disagreements among the relation-type specific node embeddings, and 2) the universal discriminator that discriminates true samples regardless of the relation types. We also show that the attention mechanism infers the importance of each relation type, and thus can be useful for filtering unnecessary relation types as a preprocessing step. Extensive experiments on various downstream tasks demonstrate that DMGI outperforms the state-of-the-art methods, even though DMGI is fully unsupervised.
Penalized k-means algorithms for finding the correct number of clusters in a dataset
Kamgar-Parsi, Behzad, Kamgar-Parsi, Behrooz
In many applications we want to find the number of clusters in a dataset. A common approach is to use the penalized k-means algorithm with an additive penalty term linear in the number of clusters. An open problem is estimating the value of the coefficient of the penalty term. Since estimating the value of the coefficient in a principled manner appears to be intractable for general clusters, we investigate "ideal clusters", i.e. identical spherical clusters with no overlaps and no outlier background noise. In this paper: (a) We derive, for the case of ideal clusters, rigorous bounds for the coefficient of the additive penalty. Unsurprisingly, the bounds depend on the correct number of clusters, which we want to find in the first place. We further show that additive penalty, even for this simplest case of ideal clusters, typically produces a weak and often ambiguous signature for the correct number of clusters. (b) As an alternative, we examine the k-means with multiplicative penalty, and show that this parameter-free formulation has a stronger, and less often ambiguous, signature for the correct number of clusters. We also empirically investigate certain types of deviations from ideal cluster assumption and show that combination of k-means with additive and multiplicative penalties can resolve ambiguous solutions.
A nonparametric framework for inferring orders of categorical data from category-real ordered pairs
Amornbunchornvej, Chainarong, Surasvadi, Navaporn, Plangprasopchok, Anon, Thajchayapong, Suttipong
Given a dataset of careers and incomes, how large a difference of income between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode $A$ instead of $B$ to travel? In this paper, we propose a framework that is able to infer orders of categories as well as magnitudes of difference of real numbers between each pair of categories using Estimation statistics framework. Not only reporting whether an order of categories exists, but our framework also reports the magnitude of difference of each consecutive pairs of categories in the order. In large dataset, our framework is scalable well compared with the existing framework. The proposed framework has been applied to two real-world case studies: 1) ordering careers by incomes based on information of 350,000 households living in Khon Kaen province, Thailand, and 2) ordering sectors by closing prices based on 1060 companies' closing prices of NASDAQ stock markets between years 2000 and 2016. The results of careers ordering show income inequality among different careers. The stock market results illustrate dynamics of sector domination that can change over time. Our approach is able to be applied in any research area that has category-real ordered pairs. Our proposed "Dominant-Distribution Network" provides a novel approach to gain new insight of analyzing category orders. The software of this framework is available for researchers or practitioners within R package: EDOIF.
Generative Models for Effective ML on Private, Decentralized Datasets
Augenstein, Sean, McMahan, H. Brendan, Ramage, Daniel, Ramaswamy, Swaroop, Kairouz, Peter, Chen, Mingqing, Mathews, Rajiv, Arcas, Blaise Aguera y
To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data - of representative samples, of outliers, of misclassifications - is an essential tool in a) identifying and fixing problems in the data, b) generating new modeling hypotheses, and c) assigning or refining human-provided labels. However, manual data inspection is problematic for privacy sensitive datasets, such as those representing the behavior of real-world individuals. Furthermore, manual data inspection is impossible in the increasingly important setting of federated learning, where raw examples are stored at the edge and the modeler may only access aggregated outputs such as metrics or model parameters. This paper demonstrates that generative models - trained using federated methods and with formal differential privacy guarantees - can be used effectively to debug many commonly occurring data issues even when the data cannot be directly inspected. We explore these methods in applications to text with differentially private federated RNNs and to images using a novel algorithm for differentially private federated GANs.
MMGAN: Generative Adversarial Networks for Multi-Modal Distributions
Pandeva, Teodora, Schubert, Matthias
Over the past years, Generative Adversarial Networks (GANs) have shown a remarkable generation performance especially in image synthesis. Unfortunately, they are also known for having an unstable training process and might loose parts of the data distribution for heterogeneous input data. In this paper, we propose a novel GAN extension for multi-modal distribution learning (MMGAN). In our approach, we model the latent space as a Gaussian mixture model with a number of clusters referring to the number of disconnected data manifolds in the observation space, and include a clustering network, which relates each data manifold to one Gaussian cluster. Thus, the training gets more stable. Moreover, MMGAN allows for clustering real data according to the learned data manifold in the latent space. By a series of benchmark experiments, we illustrate that MMGAN outperforms competitive state-of-the-art models in terms of clustering performance.