Accuracy
Entity Aware Negative Sampling with Auxiliary Loss of False Negative Prediction for Knowledge Graph Embedding
Knowledge graph (KG) embedding is widely used in many downstream applications using KGs. Generally, since KGs contain only ground truth triples, it is necessary to construct arbitrary negative samples for representation learning of KGs. Recently, various methods for sampling high-quality negatives have been studied because the quality of negative triples has great effect on KG embedding. In this paper, we propose a novel method called Entity Aware Negative Sampling (EANS), which is able to sample negative entities resemble to positive one by adopting Gaussian distribution to the aligned entity index space. Additionally, we introduce auxiliary loss for false negative prediction that can alleviate the impact of the sampled false negative triples. The proposed method can generate high-quality negative samples regardless of negative sample size and effectively mitigate the influence of false negative samples. The experimental results on standard benchmarks show that our EANS outperforms existing the state-of-the-art methods of negative sampling on several knowledge graph embedding models. Moreover, the proposed method achieves competitive performance even when the number of negative samples is limited to only one.
Fair Federated Learning via Bounded Group Loss
Hu, Shengyuan, Wu, Zhiwei Steven, Smith, Virginia
Fair prediction across protected groups is an important constraint for many federated learning applications. However, prior work studying group fair federated learning lacks formal convergence or fairness guarantees. In this work we propose a general framework for provably fair federated learning. In particular, we explore and extend the notion of Bounded Group Loss as a theoretically-grounded approach for group fairness. Using this setup, we propose a scalable federated optimization method that optimizes the empirical risk under a number of group fairness constraints. We provide convergence guarantees for the method as well as fairness guarantees for the resulting solution. Empirically, we evaluate our method across common benchmarks from fair ML and federated learning, showing that it can provide both fairer and more accurate predictions than baseline approaches.
MACE: A Flexible Framework for Membership Privacy Estimation in Generative Models
Xu, Yixi, Mukherjee, Sumit, Liu, Xiyang, Tople, Shruti, Dodhia, Rahul, Ferres, Juan Lavista
Generative machine learning models are being increasingly viewed as a way to share sensitive data between institutions. While there has been work on developing differentially private generative modeling approaches, these approaches generally lead to sub-par sample quality, limiting their use in real world applications. Another line of work has focused on developing generative models which lead to higher quality samples but currently lack any formal privacy guarantees. In this work, we propose the first formal framework for membership privacy estimation in generative models. We formulate the membership privacy risk as a statistical divergence between training samples and hold-out samples, and propose sample-based methods to estimate this divergence. Compared to previous works, our framework makes more realistic and flexible assumptions. First, we offer a generalizable metric as an alternative to the accuracy metric especially for imbalanced datasets. Second, we loosen the assumption of having full access to the underlying distribution from previous studies , and propose sample-based estimations with theoretical guarantees. Third, along with the population-level membership privacy risk estimation via the optimal membership advantage, we offer the individual-level estimation via the individual privacy risk. Fourth, our framework allows adversaries to access the trained model via a customized query, while prior works require specific attributes.
Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics
Schopf, Tim, Braun, Daniel, Matthes, Florian
In this paper, we consider the task of retrieving documents with predefined topics from an unlabeled document dataset using an unsupervised approach. The proposed unsupervised approach requires only a small number of keywords describing the respective topics and no labeled document. Existing approaches either heavily relied on a large amount of additionally encoded world knowledge or on term-document frequencies. Contrariwise, we introduce a method that learns jointly embedded document and word vectors solely from the unlabeled document dataset in order to find documents that are semantically similar to the topics described by the keywords. The proposed method requires almost no text preprocessing but is simultaneously effective at retrieving relevant documents with high probability. When successively retrieving documents on different predefined topics from publicly available and commonly used datasets, we achieved an average area under the receiver operating characteristic curve value of 0.95 on one dataset and 0.92 on another. Further, our method can be used for multiclass document classification, without the need to assign labels to the dataset in advance. Compared with an unsupervised classification baseline, we increased F1 scores from 76.6 to 82.7 and from 61.0 to 75.1 on the respective datasets. For easy replication of our approach, we make the developed Lbl2Vec code publicly available as a ready-to-use tool under the 3-Clause BSD license.
Deep Combinatorial Aggregation
Shen, Yuesong, Cremers, Daniel
Neural networks are known to produce poor uncertainty estimations, and a variety of approaches have been proposed to remedy this issue. This includes deep ensemble, a simple and effective method that achieves state-of-the-art results for uncertainty-aware learning tasks. In this work, we explore a combinatorial generalization of deep ensemble called deep combinatorial aggregation (DCA). DCA creates multiple instances of network components and aggregates their combinations to produce diversified model proposals and predictions. DCA components can be defined at different levels of granularity. And we discovered that coarse-grain DCAs can outperform deep ensemble for uncertainty-aware learning both in terms of predictive performance and uncertainty estimation. For fine-grain DCAs, we discover that an average parameterization approach named deep combinatorial weight averaging (DCWA) can improve the baseline training. It is on par with stochastic weight averaging (SWA) but does not require any custom training schedule or adaptation of BatchNorm layers. Furthermore, we propose a consistency enforcing loss that helps the training of DCWA and modelwise DCA. We experiment on in-domain, distributional shift, and out-of-distribution image classification tasks, and empirically confirm the effectiveness of DCWA and DCA approaches.
Optimizing Evaluation Metrics for Multi-Task Learning via the Alternating Direction Method of Multipliers
Ke, Ge-Yang, Pan, Yan, Yin, Jian, Huang, Chang-Qin
Multi-task learning (MTL) aims to improve the generalization performance of multiple tasks by exploiting the shared factors among them. Various metrics (e.g., F-score, Area Under the ROC Curve) are used to evaluate the performances of MTL methods. Most existing MTL methods try to minimize either the misclassified errors for classification or the mean squared errors for regression. In this paper, we propose a method to directly optimize the evaluation metrics for a large family of MTL problems. The formulation of MTL that directly optimizes evaluation metrics is the combination of two parts: (1) a regularizer defined on the weight matrix over all tasks, in order to capture the relatedness of these tasks; (2) a sum of multiple structured hinge losses, each corresponding to a surrogate of some evaluation metric on one task. This formulation is challenging in optimization because both of its parts are non-smooth. To tackle this issue, we propose a novel optimization procedure based on the alternating direction scheme of multipliers, where we decompose the whole optimization problem into a sub-problem corresponding to the regularizer and another sub-problem corresponding to the structured hinge losses. For a large family of MTL problems, the first sub-problem has closed-form solutions. To solve the second sub-problem, we propose an efficient primal-dual algorithm via coordinate ascent. Extensive evaluation results demonstrate that, in a large family of MTL problems, the proposed MTL method of directly optimization evaluation metrics has superior performance gains against the corresponding baseline methods.
Perturbation Augmentation for Fairer NLP
Qian, Rebecca, Ross, Candace, Fernandes, Jude, Smith, Eric, Kiela, Douwe, Williams, Adina
Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets. In this work, we ask whether training on demographically perturbed data leads to fairer language models. We collect a large dataset of human annotated text perturbations and train a neural perturbation model, which we show outperforms heuristic alternatives. We find that (i) language models (LMs) pre-trained on demographically perturbed corpora are typically more fair, and (ii) LMs finetuned on perturbed GLUE datasets exhibit less demographic bias on downstream tasks, and (iii) fairness improvements do not come at the expense of performance on downstream tasks. Lastly, we discuss outstanding questions about how best to evaluate the (un)fairness of large language models. We hope that this exploration of neural demographic perturbation will help drive more improvement towards fairer NLP.
Walk a Mile in Their Shoes: a New Fairness Criterion for Machine Learning
The old empathetic adage, ``Walk a mile in their shoes,'' asks that one imagine the difficulties others may face. This suggests a new ML counterfactual fairness criterion, based on a \textit{group} level: How would members of a nonprotected group fare if their group were subject to conditions in some protected group? Instead of asking what sentence would a particular Caucasian convict receive if he were Black, take that notion to entire groups; e.g. how would the average sentence for all White convicts change if they were Black, but with their same White characteristics, e.g. same number of prior convictions? We frame the problem and study it empirically, for different datasets. Our approach also is a solution to the problem of covariate correlation with sensitive attributes.
Selective Classification Via Neural Network Training Dynamics
Rabanser, Stephan, Thudi, Anvith, Hamidieh, Kimia, Dziedzic, Adam, Papernot, Nicolas
Machine learning (ML) is increasingly deployed in high-stakes decision-making environments, where it is critical to detect inputs that the model could misclassify. This is particularly true when deploying deep neural networks (DNNs) for applications with low tolerances for false-positives (i.e., classifying with a wrong label), such as healthcare (Challen et al., 2019; Mozannar and Sontag, 2020), self-driving (Ghodsi et al., 2021), and law (Vieira et al., 2021). This problem setup is captured by the selective classification (SC) framework, which introduces a gating mechanism to abstain from predicting on individual test points (Geifman and El-Yaniv, 2017). Specifically, SC aims to (i) only accept inputs on which the ML model would achieve high accuracy, while (ii) maintaining high coverage, i.e., accepting as many inputs as possible. Current selective classification techniques take one of two directions: (i) augmentation of the architecture of the underlying ML model (Geifman and El-Yaniv, 2019); or (ii) training the model using a purposefully adapted loss function (Gangrade et al., 2021). The unifying principle behind these methods is to modify the training stage in order to accommodate selective classification. In this work, we instead show that these modifications are unnecessary. That is, our method not only matches or outperforms existing work but our method is the only state-of-the-art (SOTA) approach that can be applied to all existing models. Our approach builds on the following observation: when we sequentially optimize a model for one dataset there is in fact a larger set of datapoints the model also sequentially optimized (Hardt et al., 2016; Bassily et al., 2020; Thudi et al., 2022).
How To Solve A Classification Task With Machine Learning
By now, I'm sure you've heard the term Machine Learning thrown a lot. Since most big companies and financial institutions rely on data to operate at such a large scale, it's no wonder that fields in data science are taking off. But what exactly is Machine Learning and how can we use it in a practical sense? Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. The quote above is in my opinion the best general definition of what machine learning does.