performance metric


The ultimate guide to starting AI – Towards Data Science

#artificialintelligence

Many teams try to start an applied AI project by diving into algorithms and data before figuring out desired outputs and objectives. Unfortunately, that's like raising a puppy in a New York City apartment for a few years, then being surprised that it can't herd sheep for you. Instead, the first step is for the owner -- that's you! -- to form a clear vision of what you want from your dog (or ML/AI system) and how you'll know you've trained it successfully. My previous article discussed the why, now it's time to dive into how to do this first step for ML/AI, with all its gory little sub-steps. This reference guide is densely-packed and long, so feel free to stick to large fonts and headings for a two-minute crash course. Cast of characters: decision-maker, ethicist, ML/AI engineer, analyst, qualitative expert, economist, psychologist, reliability engineer, AI researcher, domain expert, UX specialist, statistician, AI control theorist. The tasks we're about to tackle are the responsibility of the project's responsible adult. That's whoever calls the shots.


Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology

arXiv.org Machine Learning

Performance metrics (error measures) are vital components of the evaluation frameworks in various fields. The intention of this study was to overview of a variety of performance metrics and approaches to their classification. The main goal of the study was to develop a typology that will help to improve our knowledge and understanding of metrics and facilitate their selection in machine learning regression, forecasting and prognostics. Based on the analysis of the structure of numerous performance metrics, we propose a framework of metrics which includes four (4) categories: primary metrics, extended metrics, composite metrics, and hybrid sets of metrics. The paper identified three (3) key components (dimensions) that determine the structure and properties of primary metrics: method of determining point distance, method of normalization, method of aggregation of point distances over a data set.


Increasing Trust in AI Services through Supplier's Declarations of Conformity

arXiv.org Artificial Intelligence

The accuracy and reliability of machine learning algorithms are an important concern for suppliers of artificial intelligence (AI) services, but considerations beyond accuracy, such as safety, security, and provenance, are also critical elements to engender consumers' trust in a service. In this paper, we propose a supplier's declaration of conformity (SDoC) for AI services to help increase trust in AI services. An SDoC is a transparent, standardized, but often not legally required, document used in many industries and sectors to describe the lineage of a product along with the safety and performance testing it has undergone. We envision an SDoC for AI services to contain purpose, performance, safety, security, and provenance information to be completed and voluntarily released by AI service providers for examination by consumers. Importantly, it conveys product-level rather than component-level functional testing. We suggest a set of declaration items tailored to AI and provide examples for two fictitious AI services.


Performance Metrics for Classification problems in Machine Learning

#artificialintelligence

After doing the usual Feature Engineering, Selection, and of course, implementing a model and getting some output in forms of a probability or a class, the next step is to find out how effective is the model based on some metric using test datasets. Different performance metrics are used to evaluate different Machine Learning Algorithms. For now, we will be focusing on the ones used for Classification problems. We can use classification performance metrics such as Log-Loss, Accuracy, AUC(Area under Curve) etc. Another example of metric for evaluation of machine learning algorithms is precision, recall, which can be used for sorting algorithms primarily used by search engines. The metrics that you choose to evaluate your machine learning model is very important.


Walmart patents surveillance tool to listen to employees conversations

Daily Mail

Walmart has raised the ire of privacy advocates with a new patent for an audio surveillance tool. The freshly filed patent describes the need for'sounds sensors' and'listening to the frontend' technology in its stores that can pick up on conversations between employees and customers. Using these recordings, Walmart would identify employees in the audio and study it to measure their performance at the company. Walmart has raised the ire of privacy advocates with a patent for an audio surveillance tool. 'A need exists for ways to capture the sounds resulting from people in the shopping facility and determine performance of employees based on those sounds,' Walmart explains in the patent, which was filed April 20, 2017 but only made public this week.


Eliciting Binary Performance Metrics

arXiv.org Machine Learning

Given a binary prediction problem, which performance metric should the classifier optimize? We address this question by formalizing the problem of metric elicitation. In particular, we focus on eliciting binary performance metrics from pairwise preferences, where users provide relative feedback for pairs of classifiers. By exploiting key properties of the space of confusion matrices, we obtain provably query efficient algorithms for eliciting linear and linear-fractional metrics. We further show that our method is robust to feedback and finite sample noise.


Binary Classification with Karmic, Threshold-Quasi-Concave Metrics

arXiv.org Machine Learning

Complex performance measures, beyond the popular measure of accuracy, are increasingly being used in the context of binary classification. These complex performance measures are typically not even decomposable, that is, the loss evaluated on a batch of samples cannot typically be expressed as a sum or average of losses evaluated at individual samples, which in turn requires new theoretical and methodological developments beyond standard treatments of supervised learning. In this paper, we advance this understanding of binary classification for complex performance measures by identifying two key properties: a so-called Karmic property, and a more technical threshold-quasi-concavity property, which we show is milder than existing structural assumptions imposed on performance measures. Under these properties, we show that the Bayes optimal classifier is a threshold function of the conditional probability of positive class. We then leverage this result to come up with a computationally practical plug-in classifier, via a novel threshold estimator, and further, provide a novel statistical analysis of classification error with respect to complex performance measures.


Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

arXiv.org Artificial Intelligence

A variety of machine learning models have been proposed to assess the performance of players in professional sports. However, they have only a limited ability to model how player performance depends on the game context. This paper proposes a new approach to capturing game context: we apply Deep Reinforcement Learning (DRL) to learn an action-value Q function from 3M play-by-play events in the National Hockey League (NHL). The neural network representation integrates both continuous context signals and game history, using a possession-based LSTM. The learned Q-function is used to value players' actions under different game contexts. To assess a player's overall performance, we introduce a novel Game Impact Metric (GIM) that aggregates the values of the player's actions. Empirical Evaluation shows GIM is consistent throughout a play season, and correlates highly with standard success measures and future salary.


A simple 2D CNN for MNIST digit recognition – Towards Data Science

#artificialintelligence

Convolutional Neural Networks (CNNs) are the current state-of-art architecture for image classification task. Whether it is facial recognition, self driving cars or object detection, CNNs are being used everywhere. In this post, a simple 2-D Convolutional Neural Network (CNN) model is designed using keras with tensorflow backend for the well known MNIST digit recognition task. The data set used here is MNIST dataset as mentioned above. The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits (0 to 9).


Yet Another Caret Workshop

#artificialintelligence

You should always set the seed before calling train. Probably not the most amazing \(R 2\) value you have ever seen, but that's alright. Note that calling the model fit displays the most crucial information in a succinct way. Let's move on to a classification algorithm. It's good practice to start with a logistic regression and take it from there.