Goto

Collaborating Authors

 Accuracy


Efficient candidate screening under multiple tests and implications for fairness

arXiv.org Machine Learning

When recruiting job candidates, employers rarely observe their underlying skill level directly. Instead, they must administer a series of interviews and/or collate other noisy signals in order to estimate the worker's skill. Traditional economics papers address screening models where employers access worker skill via a single noisy signal. In this paper, we extend this theoretical analysis to a multi-test setting, considering both Bernoulli and Gaussian models. We analyze the optimal employer policy both when the employer sets a fixed number of tests per candidate and when the employer can set a dynamic policy, assigning further tests adaptively based on results from the previous tests. To start, we characterize the optimal policy when employees constitute a single group, demonstrating some interesting trade-offs. Subsequently, we address the multi-group setting, demonstrating that when the noise levels vary across groups, a fundamental impossibility emerges whereby we cannot administer the same number of tests, subject candidates to the same decision rule, and yet realize the same outcomes in both groups.


Classification Accuracy Score for Conditional Generative Models

arXiv.org Machine Learning

Deep generative models (DGMs) of images are now sufficiently mature that they produce nearly photorealistic samples and obtain scores similar to the data distribution on heuristics such as Frechet Inception Distance. These results, especially on large-scale datasets such as ImageNet, suggest that DGMs are learning the data distribution in a perceptually meaningful space, and can be used in downstream tasks. To test this latter hypothesis, we use class-conditional generative models from a number of model classes---variational autoencoder, autoregressive models, and generative adversarial networks---to infer the class labels of real data. We perform this inference by training the image classifier using only synthetic data, and using the classifier to predict labels on real data. The performance on this task, which we call Classification Accuracy Score (CAS), highlights some surprising results not captured by traditional metrics and comprise our contributions. First, when using a state-of-the-art GAN (BigGAN), Top-5 accuracy decreases by 41.6% compared to the original data and conditional generative models from other model classes, such as high-resolution VQ-VAE and Hierarchical Autoregressive Models, substantially outperform GANs on this benchmark. Second, CAS automatically surfaces particular classes for which generative models failed to capture the data distribution, and were previously unknown in the literature. Third, we find traditional GAN metrics such as Frechet Inception Distance neither predictive of CAS nor useful when evaluating non-GAN models. Finally, we introduce Naive Augmentation Score, a variant of CAS where the image classifier is trained on both real and synthetic data, to demonstrate that naive augmentation improves classification performance in limited circumstances. In order to facilitate better diagnoses of generative models, we open-source the proposed metric.


Evaluating Generative Models Using Divergence Frontiers

arXiv.org Machine Learning

Despite the tremendous progress in the estimation of generative models, the development of tools for diagnosing their failures and assessing their performance has advanced at a much slower pace. Recent developments have investigated metrics that quantify which parts of the true distribution are modeled well, and, on the contrary, what the model fails to capture, akin to precision and recall in information retrieval. In this paper, we present a general evaluation framework for generative models that measures the trade-off between precision and recall using R\'enyi divergences. Our framework provides a novel perspective on existing techniques and extends them to more general domains. As a key advantage, it allows for efficient algorithms that are directly applicable to continuous distributions directly without discretization. We further showcase the proposed techniques on a set of image synthesis models.


Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration

arXiv.org Machine Learning

In this paper, we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS). We propose a new randomized algorithm that has optimal generalization error bounds with respect to the square loss, closing a long-standing gap between upper and lower bounds. Moreover, we show that our algorithm has faster finite-time and asymptotic rates on problems where the Bayes risk with respect to the square loss is small. We state our results using standard tools from the theory of least square regression in RKHSs, namely, the decay of the eigenvalues of the associated integral operator and the complexity of the optimal predictor measured through the integral operator.


Dual Averaging Method for Online Graph-structured Sparsity

arXiv.org Artificial Intelligence

Online learning algorithms update models via one sample per iteration, thus efficient to process large-scale datasets and useful to detect malicious events for social benefits, such as disease outbreak and traffic congestion on the fly. However, existing algorithms for graph-structured models focused on the offline setting and the least square loss, incapable for online setting, while methods designed for online setting cannot be directly applied to the problem of complex (usually non-convex) graph-structured sparsity model. To address these limitations, in this paper we propose a new algorithm for graph-structured sparsity constraint problems under online setting, which we call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both averaging gradient (in dual space) and primal variables (in primal space) onto lower dimensional subspaces, thus capturing the graph-structured sparsity effectively. Furthermore, the objective functions assumed here are generally convex so as to handle different losses for online learning settings. To the best of our knowledge, \textsc{GraphDA} is the first online learning algorithm for graph-structure constrained optimization problems. To validate our method, we conduct extensive experiments on both benchmark graph and real-world graph datasets. Our experiment results show that, compared to other baseline methods, \textsc{GraphDA} not only improves classification performance, but also successfully captures graph-structured features more effectively, hence stronger interpretability.


Average Individual Fairness: Algorithms, Generalization and Experiments

arXiv.org Machine Learning

We propose a new family of fairness definitions for classification problems that combine some of the best properties of both statistical and individual notions of fairness. We posit not only a distribution over individuals, but also a distribution over (or collection of) classification tasks. We then ask that standard statistics (such as error or false positive/negative rates) be (approximately) equalized across individuals, where the rate is defined as an expectation over the classification tasks. Because we are no longer averaging over coarse groups (such as race or gender), this is a semantically meaningful individual-level constraint. Given a sample of individuals and classification problems, we design an oracle-efficient algorithm (i.e. one that is given access to any standard, fairness-free learning heuristic) for the fair empirical risk minimization task. We also show that given sufficiently many samples, the ERM solution generalizes in two directions: both to new individuals, and to new classification tasks, drawn from their corresponding distributions. Finally we implement our algorithm and empirically verify its effectiveness.


ASPIRE: Automated Security Policy Implementation Using Reinforcement Learning

arXiv.org Artificial Intelligence

Malware detection is an ever-present challenge for all organizational gatekeepers. Organizations often deploy numerous different malware detection tools, and then combine their output to produce a final classification for an inspected file. This approach has two significant drawbacks. First, it requires large amounts of computing resources and time since every incoming file needs to be analyzed by all detectors. Secondly, it is difficult to accurately and dynamically enforce a predefined security policy that comports with the needs of each organization (e.g., how tolerant is the organization to false negatives and false positives). In this study we propose ASPIRE, a reinforcement learning (RL)-based method for malware detection. Our approach receives the organizational policy -- defined solely by the perceived costs of correct/incorrect classifications and of computing resources -- and then dynamically assigns detection tools and sets the detection threshold for each inspected file. We demonstrate the effectiveness and robustness of our approach by conducting an extensive evaluation on multiple organizational policies. ASPIRE performed well in all scenarios, even achieving near-optimal accuracy of 96.21% (compared to an optimum of 96.86%) at approximately 20% of the running time of this baseline.


Learning Surrogate Losses

arXiv.org Machine Learning

The minimization of loss functions is the heart and soul of Machine Learning. In this paper, we propose an off-the-shelf optimization approach that can minimize virtually any non-differentiable and non-decomposable loss function (e.g. Miss-classification Rate, AUC, F1, Jaccard Index, Mathew Correlation Coefficient, etc.) seamlessly. Our strategy learns smooth relaxation versions of the true losses by approximating them through a surrogate neural network. The proposed loss networks are set-wise models which are invariant to the order of mini-batch instances. Ultimately, the surrogate losses are learned jointly with the prediction model via bilevel optimization. Empirical results on multiple datasets with diverse real-life loss functions compared with state-of-the-art baselines demonstrate the efficiency of learning surrogate losses.


Magnetoresistive RAM for error resilient XNOR-Nets

arXiv.org Machine Learning

We trained three Binarized Convolutional Neural Network architectures (LeNet-4, Network-In-Network, AlexNet) on a variety of datasets (MNIST, CIFAR-10, CIFAR-100, extended SVHN, ImageNet) using error-prone activations and tested them without errors to study the resilience of the training process. With the exception of the AlexNet when trained on the ImageNet dataset, we found that Bit Error Rates of a few percent during training do not degrade the test accuracy. Furthermore, by training the AlexNet on progressively smaller subsets of ImageNet classes, we observed increasing tolerance to activation errors. The ability to operate with high BERs is critical for reducing power consumption in existing hardware and for facilitating emerging memory technologies. We discuss how operating at moderate BER can enable Magnetoresistive RAM with higher endurance, speed and density.


Content based News Recommendation via Shortest Entity Distance over Knowledge Graphs

arXiv.org Machine Learning

Content-based news recommendation systems need to recommend news articles based on the topics and content of articles without using user specific information. Many news articles describe the occurrence of specific events and named entities including people, places or objects. In this paper, we propose a graph traversal algorithm as well as a novel weighting scheme for cold-start content based news recommendation utilizing these named entities. Seeking to create a higher degree of user-specific relevance, our algorithm computes the shortest distance between named entities, across news articles, over a large knowledge graph. Moreover, we have created a new human annotated data set for evaluating content based news recommendation systems. Experimental results show our method is suitable to tackle the hard cold-start problem and it produces stronger Pearson correlation to human similarity scores than other cold-start methods. Our method is also complementary and a combination with the conventional cold-start recommendation methods may yield significant performance gains. The dataset, CNRec, is available at: https://github.com/kevinj22/CNRec