Goto

Collaborating Authors

 Performance Analysis


Datacratic MLDB

#artificialintelligence

The business world is full of streams of items that need to be filtered or evaluated: parts on an assembly line, resumés in an application pile, emails in a delivery queue, transactions awaiting processing. Machine learning techniques are increasingly being used to make such processes more efficient: image processing to flag bad parts, text analysis to surface good candidates, spam filtering to sort email, fraud detection to lower transaction costs etc. In this article, I show how you can take business factors into account when using machine learning to solve these kinds of problems with binary classifiers. Specifically, I show how the concept of expected utility from the field of economics maps onto the Receiver Operating Characteristic (ROC) space often used by machine learning practitioners to compare and evaluate models for binary classification. I begin with a parable illustrating the dangers of not taking such factors into account. This concrete story is followed by a more formal mathematical look at the use of indifference curves in ROC space to avoid this kind of problem and guide model development. I wrap up with some recommendations for successfully using binary classifiers to solve business problems.


Energy Disaggregation for Real-Time Building Flexibility Detection

arXiv.org Machine Learning

Energy is a limited resource which has to be managed wisely, taking into account both supply-demand matching and capacity constraints in the distribution grid. One aspect of the smart energy management at the building level is given by the problem of real-time detection of flexible demand available. In this paper we propose the use of energy disaggregation techniques to perform this task. Firstly, we investigate the use of existing classification methods to perform energy disaggregation. A comparison is performed between four classifiers, namely Naive Bayes, k-Nearest Neighbors, Support Vector Machine and AdaBoost. Secondly, we propose the use of Restricted Boltzmann Machine to automatically perform feature extraction. The extracted features are then used as inputs to the four classifiers and consequently shown to improve their accuracy. The efficiency of our approach is demonstrated on a real database consisting of detailed appliance-level measurements with high temporal resolution, which has been used for energy disaggregation in previous studies, namely the REDD. The results show robustness and good generalization capabilities to newly presented buildings with at least 96% accuracy.


A note on adjusting $R^2$ for using with cross-validation

arXiv.org Machine Learning

We show how to adjust the coefficient of determination ($R^2$) when used for measuring predictive accuracy via leave-one-out cross-validation.


Multilingual Twitter Sentiment Classification: The Role of Human Annotators

arXiv.org Artificial Intelligence

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.



Classical Statistics and Statistical Learning in Imaging Neuroscience

arXiv.org Machine Learning

Neuroimaging research has predominantly drawn conclusions based on classical statistics, including null-hypothesis testing, t-tests, and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity, including cross-validation, pattern classification, and sparsity-inducing regression. These two methodological families used for neuroimaging data analysis can be viewed as two extremes of a continuum. Yet, they originated from different historical contexts, build on different theories, rest on different assumptions, evaluate different outcome metrics, and permit different conclusions. This paper portrays commonalities and differences between classical statistics and statistical learning with their relation to neuroimaging research. The conceptual implications are illustrated in three common analysis scenarios. It is thus tried to resolve possible confusion between classical hypothesis testing and data-guided model estimation by discussing their ramifications for the neuroimaging access to neurobiology.


Efficient Distributed Estimation of Inverse Covariance Matrices

arXiv.org Machine Learning

ABSTRACT In distributed systems, communication is a major concern due to issues such as its vulnerability or efficiency. In this paper, we are interested in estimating sparse inverse covariance matrices when samples are distributed into different machines. We address communication efficiency by proposing a method where, in a single round of communication, each machine transfers a small subset of the entries of the inverse covariance matrix. We show that, with this efficient distributed method, the error rates can be comparable with estimation in a non-distributed setting, and correct model selection is still possible. Practical performance is shown through simulations.


An evaluation of randomized machine learning methods for redundant data: Predicting short and medium-term suicide risk from administrative records and risk assessments

arXiv.org Machine Learning

Accurate prediction of suicide risk in mental health patients remains an open problem. Existing methods including clinician judgments have acceptable sensitivity, but yield many false positives. Exploiting administrative data has a great potential, but the data has high dimensionality and redundancies in the recording processes. We investigate the efficacy of three most effective randomized machine learning techniques - random forests, gradient boosting machines, and deep neural nets with dropout - in predicting suicide risk. Using a cohort of mental health patients from a regional Australian hospital, we compare the predictive performance with popular traditional approaches - clinician judgments based on a checklist, sparse logistic regression and decision trees. The randomized methods demonstrated robustness against data redundancies and superior predictive performance on AUC and F-measure. Keywords: Suicide risk, Electronic medical record, Predictive models, Randomized machine learning, Deep learning 1. Introduction Every year, about 2000 Australians die by suicide causing huge trauma to families, friends, workplaces and communities[1].


Personalized Risk Scoring for Critical Care Patients using Mixtures of Gaussian Process Experts

arXiv.org Machine Learning

We develop a personalized real time risk scoring algorithm that provides timely and granular assessments for the clinical acuity of ward patients based on their (temporal) lab tests and vital signs. Heterogeneity of the patients population is captured via a hierarchical latent class model. The proposed algorithm aims to discover the number of latent classes in the patients population, and train a mixture of Gaussian Process (GP) experts, where each expert models the physiological data streams associated with a specific class. Self-taught transfer learning is used to transfer the knowledge of latent classes learned from the domain of clinically stable patients to the domain of clinically deteriorating patients. For new patients, the posterior beliefs of all GP experts about the patient's clinical status given her physiological data stream are computed, and a personalized risk score is evaluated as a weighted average of those beliefs, where the weights are learned from the patient's hospital admission information. Experiments on a heterogeneous cohort of 6,313 patients admitted to Ronald Regan UCLA medical center show that our risk score outperforms the currently deployed risk scores, such as MEWS and Rothman scores.


Contrastive Structured Anomaly Detection for Gaussian Graphical Models

arXiv.org Machine Learning

Gaussian graphical models (GGMs) are probabilistic tools of choice for analyzing conditional dependencies between variables in complex systems. Finding changepoints in the structural evolution of a GGM is therefore essential to detecting anomalies in the underlying system modeled by the GGM. In order to detect structural anomalies in a GGM, we consider the problem of estimating changes in the precision matrix of the corresponding Gaussian distribution. We take a two-step approach to solving this problem:- (i) estimating a background precision matrix using system observations from the past without any anomalies, and (ii) estimating a foreground precision matrix using a sliding temporal window during anomaly monitoring. Our primary contribution is in estimating the foreground precision using a novel contrastive inverse covariance estimation procedure. In order to accurately learn only the structural changes to the GGM, we maximize a penalized log-likelihood where the penalty is the $l_1$ norm of difference between the foreground precision being estimated and the already learned background precision. We modify the alternating direction method of multipliers (ADMM) algorithm for sparse inverse covariance estimation to perform contrastive estimation of the foreground precision matrix. Our results on simulated GGM data show significant improvement in precision and recall for detecting structural changes to the GGM, compared to a non-contrastive sliding window baseline.