Performance Analysis
Lumos: A Library for Diagnosing Metric Regressions in Web-Scale Applications
Pool, Jamie, Beyrami, Ebrahim, Gopal, Vishak, Aazami, Ashkan, Gupchup, Jayant, Rowland, Jeff, Li, Binlong, Kanani, Pritesh, Cutler, Ross, Gehrke, Johannes
Web-scale applications can ship code on a daily to weekly cadence. These applications rely on online metrics to monitor the health of new releases. Regressions in metric values need to be detected and diagnosed as early as possible to reduce the disruption to users and product owners. Regressions in metrics can surface due to a variety of reasons: genuine product regressions, changes in user population, and bias due to telemetry loss (or processing) are among the common causes. Diagnosing the cause of these metric regressions is costly for engineering teams as they need to invest time in finding the root cause of the issue as soon as possible. We present Lumos, a Python library built using the principles of AB testing to systematically diagnose metric regressions to automate such analysis. Lumos has been deployed across the component teams in Microsoft's Real-Time Communication applications Skype and Microsoft Teams. It has enabled engineering teams to detect 100s of real changes in metrics and reject 1000s of false alarms detected by anomaly detectors. The application of Lumos has resulted in freeing up as much as 95% of the time allocated to metric-based investigations. In this work, we open source Lumos and present our results from applying it to two different components within the RTC group over millions of sessions. This general library can be coupled with any production system to manage the volume of alerting efficiently.
Solving the Phantom Inventory Problem: Near-optimal Entry-wise Anomaly Detection
Farias, Vivek F., Li, Andrew A., Peng, Tianyi
We observe that a crucial inventory management problem ('phantom inventory'), that by some measures costs retailers approximately 4% in annual sales can be viewed as a problem of identifying anomalies in a (low-rank) Poisson matrix. State of the art approaches to anomaly detection in low-rank matrices apparently fall short. Specifically, from a theoretical perspective, recovery guarantees for these approaches require that non-anomalous entries be observed with vanishingly small noise (which is not the case in our problem, and indeed in many applications). So motivated, we propose a conceptually simple entry-wise approach to anomaly detection in low-rank Poisson matrices. Our approach accommodates a general class of probabilistic anomaly models. We extend recent work on entry-wise error guarantees for matrix completion, establishing such guarantees for sub-exponential matrices, where in addition to missing entries, a fraction of entries are corrupted by (an also unknown) anomaly model. We show that for any given budget on the false positive rate (FPR), our approach achieves a true positive rate (TPR) that approaches the TPR of an (unachievable) optimal algorithm at a min-max optimal rate. Using data from a massive consumer goods retailer, we show that our approach provides significant improvements over incumbent approaches to anomaly detection.
Classification Performance Metric for Imbalance Data Based on Recall and Selectivity Normalized in Class Labels
In the classification of a class imbalance dataset, the performance measure used for the model selection and comparison to competing methods is a major issue. In order to overcome this problem several performance measures are defined and analyzed in several perspectives regarding in particular the imbalance ratio. There is still no clear indication which metric is universal and can be used for any skewed data problem. In this paper we introduced a new performance measure based on the harmonic mean of Recall and Selectivity normalized in class labels. This paper shows that the proposed performance measure has the right properties for the imbalanced dataset. In particular, in the space defined by the majority class examples and imbalance ratio it is less sensitive to changes in the majority class and more sensitive to changes in the minority class compared with other existing single-value performance measures. Additionally, the identity of the other performance measures has been proven analytically.
Machine learning-based clinical prediction modeling -- A practical guide for clinicians
Kernbach, Julius M., Staartjes, Victor E.
Staartjes have contributed equally to this series, and share first authorship. Abstract As analytical machine learning tools become readily available for clinicians to use, the understanding of key concepts and the awareness of analytical pitfalls are increasingly required for clinicians, investigators, reviewers and editors, who even as experts in their clinical field, sometimes find themselves insufficiently equipped to evaluate machine learning methodologies. In this section, we provide explanations on the general principles of machine learning, as well as analytical steps required for successful machine learning-based predictive modelling, which is the focus of this series. In particular, we define the terms machine learning, artificial intelligence, as well as supervised and unsupervised learning, continuing by introducing optimization, thus, the minimization of an objective error function as the central dogma of machine learning. In addition, we discuss why it is important to separate predictive and explanatory modelling, and most importantly state that a prediction model should not be used to make inferences. Lastly, we broadly describe a classical workflow for training a machine learning model, starting with data pre-processing and feature engineering and selection, continuing on with a training structure consisting of a resampling method, hyperparameter tuning, and model selection, and ending with evaluation of model discrimination and calibration as well as robust internal or external validation of the fully developed model. Methodological rigor and clarity as well as understanding of the underlying reasoning of the internal workings of a machine learning approach are required, otherwise predictive applications despite being strong analytical tools are not well accepted into the clinical routine.
On Counterfactual Explanations under Predictive Multiplicity
Pawelczyk, Martin, Broelemann, Klaus, Kasneci, Gjergji
Counterfactual explanations are usually obtained by identifying the smallest change made to an input to change a prediction made by a fixed model (hereafter called sparse methods). Recent work, however, has revitalized an old insight: there often does not exist one superior solution to a prediction problem with respect to commonly used measures of interest (e.g. error rate). In fact, often multiple different classifiers give almost equal solutions. This phenomenon is known as predictive multiplicity (Breiman, 2001; Marx et al., 2019). In this work, we derive a general upper bound for the costs of counterfactual explanations under predictive multiplicity. Most notably, it depends on a discrepancy notion between two classifiers, which describes how differently they treat negatively predicted individuals. We then compare sparse and data support approaches empirically on real-world data. The results show that data support methods are more robust to multiplicity of different models. At the same time, we show that those methods have provably higher cost of generating counterfactual explanations under one fixed model. In summary, our theoretical and empiricaln results challenge the commonly held view that counterfactual recommendations should be sparse in general.
Distance Correlation Sure Independence Screening for Accelerated Feature Selection in Parkinson's Disease Vocal Data
Schellhas, Dan, Neupane, Bishal, Thammineni, Deepak, Kanumuri, Bhargav, Green, Robert C. II
With the abundance of machine learning methods available and the temptation of using them all in an ensemble method, having a model-agnostic method of feature selection is incredibly alluring. Principal component analysis was developed in 1901 and has been a strong contender in this role since, but in the end is an unsupervised method. It offers no guarantee that the features that are selected have good predictive power because it does not know what is being predicted. To this end, Peng et al. developed the minimum redundancy-maximum relevance (mRMR) method in 2005. It uses the mutual information not only between predictors but also includes the mutual information with the response in its calculation. Estimating mutual information and entropy tend to be expensive and problematic endeavors, which leads to excessive processing times even for dataset that is approximately 750 by 750 in a Leave-One-Subject-Out jackknife situation. To remedy this, we use a method from 2012 called Distance Correlation Sure Independence Screening (DC-SIS) which uses the distance correlation measure of Sz\'ekely et al. to select features that have the greatest dependence with the response. We show that this method produces statistically indistinguishable results to the mRMR selection method on Parkinson's Disease vocal diagnosis data 90 times faster.
Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks
Croce, Francesco, Andriushchenko, Maksym, Singh, Naman D., Flammarion, Nicolas, Hein, Matthias
A large body of research has focused on adversarial attacks which require to modify all input features with small $l_2$- or $l_\infty$-norms. In this paper we instead focus on query-efficient sparse attacks in the black-box setting. Our versatile framework, Sparse-RS, based on random search achieves state-of-the-art success rate and query efficiency for different sparse attack models such as $l_0$-bounded perturbations (outperforming established white-box methods), adversarial patches, and adversarial framing. We show the effectiveness of Sparse-RS on different datasets considering problems from image recognition and malware detection and multiple variations of sparse threat models, including targeted and universal perturbations. In particular Sparse-RS can be used for realistic attacks such as universal adversarial patch attacks without requiring a substitute model. The code of our framework is available at https://github.com/fra31/sparse-rs.
From Predictions to Decisions: Using Lookahead Regularization
Rosenfeld, Nir, Hilgard, Sophie, Ravindranath, Sai Srivatsa, Parkes, David C.
Machine learning is a powerful tool for predicting human-related outcomes, from credit scores to heart attack risks. But when deployed, learned models also affect how users act in order to improve outcomes, whether predicted or real. The standard approach to learning is agnostic to induced user actions and provides no guarantees as to the effect of actions. We provide a framework for learning predictors that are both accurate and promote good actions. For this, we introduce look-ahead regularization which, by anticipating user actions, encourages predictive models to also induce actions that improve outcomes. This regularization carefully tailors the uncertainty estimates governing confidence in this improvement to the distribution of model-induced actions. We report the results of experiments on real and synthetic data that show the effectiveness of this approach.
Information-theoretic User Interaction: Significant Inputs for Program Synthesis
Tiwari, Ashish, Radhakrishna, Arjun, Gulwani, Sumit, Perelman, Daniel
Programming-by-example technologies are being deployed in industrial products for real-time synthesis of various kinds of data transformations. These technologies rely on the user to provide few representative examples of the transformation task. Motivated by the need to find the most pertinent question to ask the user, in this paper, we introduce the {\em significant questions problem}, and show that it is hard in general. We then develop an information-theoretic greedy approach for solving the problem. We justify the greedy algorithm using the conditional entropy result, which informally says that the question that achieves the maximum information gain is the one that we know least about. In the context of interactive program synthesis, we use the above result to develop an {\em{active program learner}} that generates the significant inputs to pose as queries to the user in each iteration. The procedure requires extending a {\em{passive program learner}} to a {\em{sampling program learner}} that is able to sample candidate programs from the set of all consistent programs to enable estimation of information gain. It also uses clustering of inputs based on features in the inputs and the corresponding outputs to sample a small set of candidate significant inputs. Our active learner is able to tradeoff false negatives for false positives and converge in a small number of iterations on a real-world dataset of %around 800 string transformation tasks.
HNHN: Hypergraph Networks with Hyperedge Neurons
Dong, Yihe, Sawin, Will, Bengio, Yoshua
Hypergraphs provide a natural representation for many real world datasets. We propose a novel framework, HNHN, for hypergraph representation learning. HNHN is a hypergraph convolution network with nonlinear activation functions applied to both hypernodes and hyperedges, combined with a normalization scheme that can flexibly adjust the importance of high-cardinality hyperedges and high-degree vertices depending on the dataset. We demonstrate improved performance of HNHN in both classification accuracy and speed on real world datasets when compared to state of the art methods.