Not enough data to create a plot.
Try a different view from the menu above.
Ravikumar, Pradeep
Evaluations and Methods for Explanation through Robustness Analysis
Hsieh, Cheng-Yu, Yeh, Chih-Kuan, Liu, Xuanqing, Ravikumar, Pradeep, Kim, Seungyeon, Kumar, Sanjiv, Hsieh, Cho-Jui
Among multiple ways of interpreting a machine learning model, measuring the importance of a set of features tied to a prediction is probably one of the most intuitive ways to explain a model. In this paper, we establish the link between a set of features to a prediction with a new evaluation criterion, robustness analysis, which measures the minimum distortion distance of adversarial perturbation. By measuring the tolerance level for an adversarial attack, we can extract a set of features that provides the most robust support for a prediction, and also can extract a set of features that contrasts the current prediction to a target class by setting a targeted adversarial attack. By applying this methodology to various prediction tasks across multiple domains, we observe the derived explanations are indeed capturing the significant feature set qualitatively and quantitatively.
Class-Weighted Classification: Trade-offs and Robust Approaches
Xu, Ziyu, Dan, Chen, Khim, Justin, Ravikumar, Pradeep
We address imbalanced classification, the problem in which a label may have low marginal probability relative to other labels, by weighting losses according to the correct class. First, we examine the convergence rates of the expected excess weighted risk of plug-in classifiers where the weighting for the plug-in classifier and the risk may be different. This leads to irreducible errors that do not converge to the weighted Bayes risk, which motivates our consideration of robust risks. We define a robust risk that minimizes risk over a set of weightings and show excess risk bounds for this problem. Finally, we show that particular choices of the weighting set leads to a special instance of conditional value at risk (CVaR) from stochastic programming, which we call label conditional value at risk (LCVaR). Additionally, we generalize this weighting to derive a new robust risk problem that we call label heterogeneous conditional value at risk (LHCVaR). Finally, we empirically demonstrate the efficacy of LCVaR and LHCVaR on improving class conditional risks.
Learning Sparse Nonparametric DAGs
Zheng, Xun, Dan, Chen, Aragam, Bryon, Ravikumar, Pradeep, Xing, Eric P.
We develop a framework for learning sparse nonparametric directed acyclic graphs (DAGs) from data. Our approach is based on a recent algebraic characterization of DAGs that led to the first fully continuous optimization for score-based learning of DAG models parametrized by a linear structural equation model (SEM). We extend this algebraic characterization to nonparametric SEM by leveraging nonparametric sparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases. We also explore the use of neural networks and orthogonal basis expansions to model nonlinearities for general nonparametric models. Extensive empirical study confirms the necessity of nonlinear dependency and the advantage of continuous optimization for score-based learning.
A Unified Approach to Robust Mean Estimation
Prasad, Adarsh, Balakrishnan, Sivaraman, Ravikumar, Pradeep
Modern data sets that arise in various branches of science and engineering are characterized by their ever increasing scale and richness. This is spurred in part by easier access to computer, internet and various sensor-based technologies that enable the collection of such varied datasets. But on the flip side, these large and rich data-sets are no longer carefully curated, are often collected in a decentralized, distributed fashion, and consequently are plagued with the complexities of heterogeneity, adversarial manipulations, and outliers. The analysis of these huge datasets is thus fraught with methodological challenges. To understand the fundamental challenges and tradeoffs in handling such "dirty data" is precisely the premise of the field of robust statistics. Here, the aforementioned complexities are largely formalized under two different models of robustness: (1) The heavy-tailed model: Here the sampling distribution can have thick tails, for instance, only low-order moments of the distribution are assumed to be finite; and (2) The วซ-contamination model: Here the sampling distribution is modeled as a well-behaved distribution contaminated by an วซ fraction of arbitrary outliers. In each case, classical estimators of the distribution (based for instance on the maximum likelihood estimator) can behave considerably worse (potentially arbitrarily worse) than under standard settings where the data is better behaved, satisfying various regularity properties. In particular, these classical estimators can be extremely sensitive to the tails of the distribution or to the outliers, so that the broad goal in robust statistics is to construct estimators that improve on these classical estimators by reducing their sensitivity to outliers.
Adaptive Hard Thresholding for Near-optimal Consistent Robust Regression
Suggala, Arun Sai, Bhatia, Kush, Ravikumar, Pradeep, Jain, Prateek
We study the problem of robust linear regression with response variable corruptions. We consider the oblivious adversary model, where the adversary corrupts a fraction of the responses in complete ignorance of the data. We provide a nearly linear time estimator which consistently estimates the true regression vector, even with $1-o(1)$ fraction of corruptions. Existing results in this setting either don't guarantee consistent estimates or can only handle a small fraction of corruptions. We also extend our estimator to robust sparse linear regression and show that similar guarantees hold in this setting. Finally, we apply our estimator to the problem of linear regression with heavy-tailed noise and show that our estimator consistently estimates the regression vector even when the noise has unbounded variance (e.g., Cauchy distribution), for which most existing results don't even apply. Our estimator is based on a novel variant of outlier removal via hard thresholding in which the threshold is chosen adaptively and crucially relies on randomness to escape bad fixed points of the non-convex hard thresholding operation.
How Sensitive are Sensitivity-Based Explanations?
Yeh, Chih-Kuan, Hsieh, Cheng-Yu, Suggala, Arun Sai, Inouye, David, Ravikumar, Pradeep
We propose a simple objective evaluation measure for explanations of a complex black-box machine learning model. While most such model explanations have largely been evaluated via qualitative measures, such as how humans might qualitatively perceive the explanations, it is vital to also consider objective measures such as the one we propose in this paper. Our evaluation measure that we naturally call sensitivity is simple: it characterizes how an explanation changes as we vary the test input, and depending on how we measure these changes, and how we vary the input, we arrive at different notions of sensitivity. We also provide a calculus for deriving sensitivity of complex explanations in terms of that for simpler explanations, which thus allows an easy computation of sensitivities for yet to be proposed explanations. One advantage of an objective evaluation measure is that we can optimize the explanation with respect to the measure: we show that (1) any given explanation can be simply modified to improve its sensitivity with just a modest deviation from the original explanation, and (2) gradient based explanations of an adversarially trained network are less sensitive. Perhaps surprisingly, our experiments show that explanations optimized to have lower sensitivity can be more faithful to the model predictions.
Towards Aggregating Weighted Feature Attributions
Bhatt, Umang, Ravikumar, Pradeep, Moura, Jose M. F.
Current approaches for explaining machine learning models fall into two distinct classes: antecedent event influence and value attribution. The former leverages training instances to describe how much influence a training point exerts on a test point, while the latter attempts to attribute value to the features most pertinent to a given prediction. In this work, we discuss an algorithm, AVA: Aggregate Valuation of Antecedents, that fuses these two explanation classes to form a new approach to feature attribution that not only retrieves local explanations but also captures global patterns learned by a model. Our experimentation convincingly favors weighting and aggregating feature attributions via AVA.
A Voting-Based System for Ethical Decision Making
Noothigattu, Ritesh, Gaikwad, Snehalkumar 'Neil' S., Awad, Edmond, Dsouza, Sohan, Rahwan, Iyad, Ravikumar, Pradeep, Procaccia, Ariel D.
The problem of ethical decision making, which has long been a grand challenge for AI [23], has recently caught the public imagination. Perhaps its best-known manifestation is a modern variant of the classic trolley problem [10]: An autonomous vehicle has a brake failure, leading to an accident with inevitably tragic consequences; due to the vehicle's superior perception and computation capabilities, it can make an informed decision. Should it stay its course and hit a wall, killing its three passengers, one of whom is a young girl? Or swerve and kill a male athlete and his dog, who are crossing the street on a red light? A notable paper by Bonnefon et al. [2] has shed some light on how people address such questions, and even former US President Barack Obama has weighed in.
Word Mover's Embedding: From Word2Vec to Document Embedding
Wu, Lingfei, Yen, Ian E. H., Xu, Kun, Xu, Fangli, Balakrishnan, Avinash, Chen, Pin-Yu, Ravikumar, Pradeep, Witbrock, Michael J.
While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings. Recent work has demonstrated that a distance measure between documents called \emph{Word Mover's Distance} (WMD) that aligns semantically similar words, yields unprecedented KNN classification accuracy. However, WMD is expensive to compute, and it is hard to extend its use beyond a KNN classifier. In this paper, we propose the \emph{Word Mover's Embedding } (WME), a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings. In our experiments on 9 benchmark text classification datasets and 22 textual similarity tasks, the proposed technique consistently matches or outperforms state-of-the-art techniques, with significantly higher accuracy on problems of short length.
Learning Tensor Latent Features
Chang, Sung-En, Zheng, Xun, Yen, Ian E. H., Ravikumar, Pradeep, Yu, Rose
Compared to the classic latent factor models [14], latent feature models have two main benefits: (1) interpretablity: the binary codes directly reveal whether certain features exist in the data, thus provide more interpretable latent profiles [25]; (2) scalability: compared with real-valued codes, binary codes require fewer bits to store, thereby cutting down the memory footprint, making it easier to deploy into memory constrained environments such as mobile devices. Tensor latent feature models generalize traditional matrix latent feature models to represent high-order correlation structures in the data. For example, in spatiotemporal recommender systems, the observations are user activities over different locations and time. We want to learn the latent features and codes that correspond to user, space and time simultaneously without assuming conditional independence of these three dimensions. In this case, we can first represent such data as a high-order tensor and assign binary codes encoding presence or absence of rows for individual modes of the tensor. These codes can then help us answer the "when" and "where" questions regarding the learned user preferences. Besides the non-convex formulation of the maximum likelihood estimation (MLE) objective, learning latent feature models is further complicated by the combinatorial nature of the codes.