Country
Fundamental Issues Regarding Uncertainties in Artificial Neural Networks
Thacker, Neil A., Twining, Carole J., Tar, Paul D., Notley, Scott, Ramesh, Visvanathan
Artificial Neural Networks (ANNs) implement a specific form of multi-variate extrapolation and will generate an output for any input pattern, even when there is no similar training pattern. Extrapolations are not necessarily to be trusted, and in order to support safety critical systems, we require such systems to give an indication of the training sample related uncertainty associated with their output. Some readers may think that this is a well known issue which is already covered by the basic principles of pattern recognition. We will explain below how this is not the case and how the conventional (Likelihood estimate of) conditional probability of classification does not correctly assess this uncertainty. We provide a discussion of the standard interpretations of this problem and show how a quantitative approach based upon long standing methods can be practically applied. The methods are illustrated on the task of early diagnosis of dementing diseases using Magnetic Resonance Imaging.
TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar Systems
Roy, Sourjya, Sridharan, Shrihari, Jain, Shubham, Raghunathan, Anand
Resistive crossbars have attracted significant interest in the design of Deep Neural Network (DNN) accelerators due to their ability to natively execute massively parallel vector-matrix multiplications within dense memory arrays. However, crossbar-based computations face a major challenge due to a variety of device and circuit-level non-idealities, which manifest as errors in the vector-matrix multiplications and eventually degrade DNN accuracy. To address this challenge, there is a need for tools that can model the functional impact of non-idealities on DNN training and inference. Existing efforts towards this goal are either limited to inference, or are too slow to be used for large-scale DNN training. We propose TxSim, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware considering the impact of non-idealities. The key features of TxSim that differentiate it from prior efforts are: (i) It comprehensively models non-idealities during all training operations (forward propagation, backward propagation, and weight update) and (ii) it achieves computational efficiency by mapping crossbar evaluations to well-optimized BLAS routines and incorporates speedup techniques to further reduce simulation time with minimal impact on accuracy. TxSim achieves orders-of-magnitude improvement in simulation speed over prior works, and thereby makes it feasible to evaluate training of large-scale DNNs on crossbars. Our experiments using TxSim reveal that the accuracy degradation in DNN training due to non-idealities can be substantial (3%-10%) for large-scale DNNs, underscoring the need for further research in mitigation techniques. We also analyze the impact of various device and circuit-level parameters and the associated non-idealities to provide key insights that can guide the design of crossbar-based DNN training accelerators.
Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions
Golrezaei, Negin, Javanmard, Adel, Mirrokni, Vahab
Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers' valuations, i.e., buyers' preferences. The seller's goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is to minimize her regret for revenue, where the regret is computed against a clairvoyant policy that knows buyers' heterogeneous preferences. Given the seller's goal, utility-maximizing buyers have the incentive to bid untruthfully in order to manipulate the seller's learning policy. We propose learning policies that are robust to such strategic behavior. These policies use the outcomes of the auctions, rather than the submitted bids, to estimate the preferences while controlling the long-term effect of the outcome of each auction on the future reserve prices. When the market noise distribution is known to the seller, we propose a policy called Contextual Robust Pricing (CORP) that achieves a T-period regret of $O(d\log(Td) \log (T))$, where $d$ is the dimension of {the} contextual information. When the market noise distribution is unknown to the seller, we propose two policies whose regrets are sublinear in $T$.
Relevant-features based Auxiliary Cells for Energy Efficient Detection of Natural Errors
Aketi, Sai Aparna, Panda, Priyadarshini, Roy, Kaushik
Deep neural networks have demonstrated state-of-the-art performance on many classification tasks. However, they have no inherent capability to recognize when their predictions are wrong. There have been several efforts in the recent past to detect natural errors but the suggested mechanisms pose additional energy requirements. To address this issue, we propose an ensemble of classifiers at hidden layers to enable energy efficient detection of natural errors. In particular, we append Relevant-features based Auxiliary Cells (RACs) which are class specific binary linear classifiers trained on relevant features. The consensus of RACs is used to detect natural errors. Based on combined confidence of RACs, classification can be terminated early, thereby resulting in energy efficient detection. We demonstrate the effectiveness of our technique on various image classification datasets such as CIFAR-10, CIFAR-100 and Tiny-ImageNet.
Stochastic Normalizing Flows
Hodgkinson, Liam, van der Heide, Chris, Roosta, Fred, Mahoney, Michael W.
Normalizing flows (Rezende & Mohamed, 2015) are probabilistic models constructed as a sequence of successive transformations applied to some initial distribution. A key strength of normalizing flows is their expressive power as generative models, while enjoying an explicitly computable form of the likelihood function evaluated on the transformed space. This makes them especially well-equipped for variational inference (VI). Neural networks are often used as inspiration for finding effective transformations (Dinh et al., 2015; van den Berg et al., 2018). Continuous normalizing flows were later developed in Chen et al. (2018) as a means to perform maximum likelihood estimation and VI for large-scale probabilistic models derived from ordinary differential equations (ODEs).
Be Like Water: Robustness to Extraneous Variables Via Adaptive Feature Normalization
Kaku, Aakash, Mohan, Sreyas, Parnandi, Avinash, Schambra, Heidi, Fernandez-Granda, Carlos
Extraneous variables are variables that are irrelevant for a certain task, but heavily affect the distribution of the available data. In this work, we show that the presence of such variables can degrade the performance of deep-learning models. We study three datasets where there is a strong influence of known extraneous variables: classification of upper-body movements in stroke patients, annotation of surgical activities, and recognition of corrupted images. Models trained with batch normalization learn features that are highly dependent on the extraneous variables. In batch normalization, the statistics used to normalize the features are learned from the training set and fixed at test time, which produces a mismatch in the presence of varying extraneous variables. We demonstrate that estimating the feature statistics adaptively during inference, as in instance normalization, addresses this issue, producing normalized features that are more robust to changes in the extraneous variables. This results in a significant gain in performance for different network architectures and choices of feature statistics.
A General Method for Robust Learning from Batches
In many applications, data is collected in batches, some of which are corrupt or even adversarial. Recent work derived optimal robust algorithms for estimating discrete distributions in this setting. We consider a general framework of robust learning from batches, and determine the limits of both classification and distribution estimation over arbitrary, including continuous, domains. Building on these results, we derive the first robust agnostic computationally-efficient learning algorithms for piecewise-interval classification, and for piecewise-polynomial, monotone, log-concave, and gaussian-mixture distribution estimation.
Causal Inference With Selectively-Deconfounded Data
Gan, Kyra, Li, Andrew A., Lipton, Zachary C., Tayur, Sridhar
Given only data generated by a standard confounding graph with unobserved confounder, the Average Treatment Effect (ATE) is not identifiable. To estimate the ATE, a practitioner must then either (a) collect deconfounded data; (b) run a clinical trial; or (c) elucidate further properties of the causal graph that might render the ATE identifiable. In this paper, we consider the benefit of incorporating a (large) confounded observational dataset alongside a (small) deconfounded observational dataset when estimating the ATE. Our theoretical results show that the inclusion of confounded data can significantly reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level. Moreover, in some cases---say, genetics---we could imagine retrospectively selecting samples to deconfound. We demonstrate that by strategically selecting these examples based upon the (already observed) treatment and outcome, we can reduce our data dependence further. Our theoretical and empirical results establish that the worst-case relative performance of our approach (vs. a natural benchmark) is bounded while our best-case gains are unbounded. Next, we demonstrate the benefits of selective deconfounding using a large real-world dataset related to genetic mutation in cancer. Finally, we introduce an online version of the problem, proposing two adaptive heuristics.
The Curious Case of Adversarially Robust Models: More Data Can Help, Double Descend, or Hurt Generalization
Min, Yifei, Chen, Lin, Karbasi, Amin
Despite remarkable success, deep neural networks are sensitive to human-imperceptible small perturbations on the data and could be adversarially misled to produce incorrect or even dangerous predictions. To circumvent these issues, practitioners introduced adversarial training to produce adversarially robust models whose predictions are robust to small perturbations to the data. It is widely believed that more training data will help adversarially robust models generalize better on the test data. In this paper, however, we challenge this conventional belief and show that more training data could hurt the generalization of adversarially robust models for the linear classification problem. We identify three regimes based on the strength of the adversary. In the weak adversary regime, more data improves the generalization of adversarially robust models. In the medium adversary regime, with more training data, the generalization loss exhibits a double descent curve. This implies that in this regime, there is an intermediate stage where more training data hurts their generalization. In the strong adversary regime, more data almost immediately causes the generalization error to increase.
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers
Hanna, Serge Kas, Bitar, Rawad, Parag, Parimal, Dasari, Venkat, Rouayheb, Salim El
We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k