Goto

Collaborating Authors

 Performance Analysis


Training wide residual networks for deployment using a single bit for each weight

arXiv.org Machine Learning

For fast and energy-efficient deployment of trained deep neural networks on resource-constrained embedded hardware, each learned weight parameter should ideally be represented and stored using a single bit. Error-rates usually increase when this requirement is imposed. Here, we report large improvements in error rates on multiple datasets, for deep convolutional neural networks deployed with 1-bit-per-weight. Using wide residual networks as our main baseline, our approach simplifies existing methods that binarize weights by applying the sign function in training; we apply scaling factors for each layer with constant unlearned values equal to the layer-specific standard deviations used for initialization. For CIFAR-10, CIFAR-100 and ImageNet, and models with 1-bit-per-weight requiring less than 10 MB of parameter memory, we achieve error rates of 3.9%, 18.5% and 26.0% / 8.5% (Top-1 / Top-5) respectively. We also considered MNIST, SVHN and ImageNet32, achieving 1-bit-per-weight test results of 0.27%, 1.9%, and 41.3% / 19.1% respectively. For CIFAR, our error rates halve previously reported values, and are within about 1% of our error-rates for the same network with full-precision weights. For networks that overfit, we also show significant improvements in error rate by not learning batch normalization scale and offset parameters. This applies to both full precision and 1-bit-per-weight networks. Using a warm-restart learning-rate schedule, we found that training for 1-bit-per-weight is just as fast as full-precision networks, with better accuracy than standard schedules, and achieved about 98%-99% of peak performance in just 62 training epochs for CIFAR-10/100. For full training code and trained models in MATLAB, Keras and PyTorch see https://github.com/McDonnell-Lab/1-bit-per-weight/ .


Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples

arXiv.org Machine Learning

The problem of detecting whether a test sample is from in-distribution (i.e., training distribution by a classifier) or out-of-distribution sufficiently different from it arises in many real-world machine learning applications. However, the state-of-art deep neural networks are known to be highly overconfident in their predictions, i.e., do not distinguish in- and out-of-distributions. Recently, to handle this issue, several threshold-based detectors have been proposed given pre-trained neural classifiers. However, the performance of prior works highly depends on how to train the classifiers since they only focus on improving inference procedures. In this paper, we develop a novel training method for classifiers so that such inference algorithms can work better. In particular, we suggest two additional terms added to the original loss (e.g., cross entropy). The first one forces samples from out-of-distribution less confident by the classifier and the second one is for (implicitly) generating most effective training samples for the first one. In essence, our method jointly trains both classification and generative neural networks for out-of-distribution. We demonstrate its effectiveness using deep convolutional neural networks on various popular image datasets.


High-Dimensional Vector Semantics

arXiv.org Artificial Intelligence

In many natural language processing tasks the words and the documents are represented using the "bag of words" model. In such a model, a document is represented by a high-dimensional vector, with the components corresponding to the frequency of a particular word in the document (for a detailed discussion see [1-3] and the references within). For example, assuming an English vocabulary of 25, 000 words, each document will be represented by a 25, 000 dimensional vector, where the component i is the frequency of the ith word in the document. The vector representation is particularly useful in text classification tasks, where the similarity of two documents can be simply estimated using the dot product between the vectors. If the vectors are normalized, then their dot product is equal to the cosine of the angle between the vectors, and therefore the more parallel the vectors are, the more similar the documents are.


Vote-boosting ensembles

arXiv.org Machine Learning

Vote-boosting is a sequential ensemble learning method in which the individual classifiers are built on different weighted versions of the training data. To build a new classifier, the weight of each training instance is determined in terms of the degree of disagreement among the current ensemble predictions for that instance. For low class-label noise levels, especially when simple base learners are used, emphasis should be made on instances for which the disagreement rate is high. When more flexible classifiers are used and as the noise level increases, the emphasis on these uncertain instances should be reduced. In fact, at sufficiently high levels of class-label noise, the focus should be on instances on which the ensemble classifiers agree. The optimal type of emphasis can be automatically determined using cross-validation. An extensive empirical analysis using the beta distribution as emphasis function illustrates that vote-boosting is an effective method to generate ensembles that are both accurate and robust.


Adversarial classification: An adversarial risk analysis approach

arXiv.org Machine Learning

Classification is one of the most widely used instances of supervised learning, with applications in numerous fields including spam detection, Fan et al. (2016); computer vision, Chen (2015); and genomics, Zhou et al. (2005). In recent years, the field has experienced an enormous growth becoming a major research area in statistics and machine learning, Efron and Hastie (2016). Most efforts in classification have focused on obtaining more accurate algorithms which, however, largely ignore a relevant issue in many applications: the presence of adversaries who actively manipulate the data to fool the classifier so as to attain a benefit. As an example, when a spammer makes the classifier think that a spam is legit, he may profit by selling the information he gets from the victim. In such contexts, as classification algorithms improve, adversaries usually become smarter when making attacks.


Cross-Modality Synthesis from CT to PET using FCN and GAN Networks for Improved Automated Lesion Detection

arXiv.org Artificial Intelligence

In this work we present a novel system for generation of virtual PET images using CT scans. We combine a fully convolutional network (FCN) with a conditional generative adversarial network (GAN) to generate simulated PET data from given input CT data. The synthesized PET can be used for false-positive reduction in lesion detection solutions. Clinically, such solutions may enable lesion detection and drug treatment evaluation in a CT-only environment, thus reducing the need for the more expensive and radioactive PET/CT scan. Our dataset includes 60 PET/CT scans from Sheba Medical center. We used 23 scans for training and 37 for testing. Different schemes to achieve the synthesized output were qualitatively compared. Quantitative evaluation was conducted using an existing lesion detection software, combining the synthesized PET as a false positive reduction layer for the detection of malignant lesions in the liver. Current results look promising showing a 28% reduction in the average false positive per case from 2.9 to 2.1. The suggested solution is comprehensive and can be expanded to additional body organs, and different modalities.


WWE Elimination Chamber 2018: Predictions, Matches For Final 'Raw' PPV Before WrestleMania 34

International Business Times

When WWE Elimination Chamber 2018 takes place Sunday night in Las Vegas, it'll be the final "Monday Night Raw" pay-per-view before WrestleMania 34. Seven of the best wrestlers in the entire company will compete to determine the main event of the year's biggest show, and Ronda Rousey will make an appearance to officially become a member of the WWE roster. Below are predictions for every match on the WWE Elimination Chamber card, though more matches could be added before the PPV begins. Men's Elimination Chamber Match (Winner to face Brock Lesnar for the Universal Title at WrestleMania) We've known for nearly a year that Roman Reigns would challenge Lesnar for the title in the WrestleMania 34 main event, and this is how the Shield member is going to get his opportunity. The real question is how will everyone else be eliminated?


Learning to Abstain via Curve Optimization

arXiv.org Machine Learning

In practical applications of machine learning, it is often desirable to identify and abstain on examples where the a model's predictions are likely to be incorrect. We consider the problem of selecting a budget-constrained subset of test examples to abstain on, with the goal of maximizing performance on the remaining examples. We develop a novel approach to this problem by analytically optimizing the expected marginal improvement in a desired performance metric, such as the area under the ROC curve or Precision-Recall curve. We compare our approach to other abstention techniques for deep learning models based on posterior probability and uncertainty estimates obtained using test-time dropout. On various tasks in computer vision, natural language processing, and bioinformatics, we demonstrate the consistent effectiveness of our approach over other techniques. We also introduce novel diagnostics based on influence functions to understand the behavior of abstention methods in the presence of noisy training data, and leverage the insights to propose a new influence-based abstention method.


Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator

arXiv.org Machine Learning

Measuring divergence between two distributions is essential in machine learning and statistics and has various applications including binary classification, change point detection, and two-sample test. Furthermore, in the era of big data, designing divergence measure that is interpretable and can handle high-dimensional and complex data becomes extremely important. In the paper, we propose a post selection inference (PSI) framework for divergence measure, which can select a set of statistically significant features that discriminate two distributions. Specifically, we employ an additive variant of maximum mean discrepancy (MMD) for features and introduce a general hypothesis test for PSI. A novel MMD estimator using the incomplete U-statistics, which has an asymptotically Normal distribution (under mild assumptions) and gives high detection power in PSI, is also proposed and analyzed theoretically. Through synthetic and real-world feature selection experiments, we show that the proposed framework can successfully detect statistically significant features. Last, we propose a sample selection framework for analyzing different members in the Generative Adversarial Networks (GANs) family.


Learning Adversarially Fair and Transferable Representations

arXiv.org Machine Learning

In this work, we advocate for representation learning as the key to mitigating unfair prediction outcomes downstream. We envision a scenario where learned representations may be handed off to other entities with unknown objectives. We propose and explore adversarial representation learning as a natural method of ensuring those entities will act fairly, and connect group fairness (demographic parity, equalized odds, and equal opportunity) to different adversarial objectives. Through worst-case theoretical guarantees and experimental validation, we show that the choice of this objective is crucial to fair prediction. Furthermore, we present the first in-depth experimental demonstration of fair transfer learning, by showing that our learned representations admit fair predictions on new tasks while maintaining utility, an essential goal of fair representation learning.