Accuracy
Comparing Model Evaluation Techniques Part 2: Classification and Clustering - DataScienceCentral.com
In part 1, I compared a few model evaluation techniques that fall under the umbrella of'general statistical tools and tests'. Here in Part 2 I compare three of the more popular model evaluation techniques for classification and clustering: confusion matrix, gain and lift chart, and ROC curve. That said, you'll want to choose a method that gives you the answers you need for the particular field you're in. For example, while a confusion matrix can be a great tool for comparing models, it isn't much good for marketing decisions (where the gain and lift chart would be a better choice). Other less popular (but still valid) tools include the K-S chart and Gini Coefficient.
Prognosis of Rotor Parts Fly-off Based on Cascade Classification and Online Prediction Ability Index
Shen, Yingjun, Song, Zhe, Kusiak, Andrew
Large rotating machines, e.g., compressors, steam turbines, gas turbines, are critical equipment in many process industries such as energy, chemical, and power generation. Due to high rotating speed and tremendous momentum of the rotor, the centrifugal force may lead to flying apart of the rotor parts, which brings a great threat to the operation safety. Early detection and prediction of potential failures could prevent the catastrophic plant downtime and economic loss. In this paper, we divide the operational states of a rotating machine into normal, risky, and high-risk ones based on the time to the moment of failure. Then a cascade classifying algorithm is proposed to predict the states in two steps, first we judge whether the machine is in normal or abnormal condition; for time periods which are predicted as abnormal we further classify them into risky or high-risk states. Moreover, traditional classification model evaluation metrics, such as confusion matrix, true-false accuracy, are static and neglect the online prediction dynamics and uneven wrong-prediction prices. An Online Prediction Ability Index (OPAI) is proposed to select prediction models with consistent online predictions and smaller close-to-downtime prediction errors. Real-world data sets and computational experiments are used to verify the effectiveness of proposed methods.
A technique for making quantum computing more resilient to noise, which boosts performance
Quantum computing continues to advance at a rapid pace, but one challenge that holds the field back is mitigating the noise that plagues quantum machines. This leads to much higher error rates compared to classical computers. This noise is often caused by imperfect control signals, interference from the environment, and unwanted interactions between qubits, which are the building blocks of a quantum computer. Performing computations on a quantum computer involves a "quantum circuit," which is a series of operations called quantum gates. These quantum gates, which are mapped to the individual qubits, change the quantum states of certain qubits, which then perform the calculations to solve a problem.
Why Precision and Recall metric ?
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Why 90's % accuracy cannot decide the wellness of your Machine Learning Model?
Accuracy versus interpretability? With generalized additive models (GAMs), you can have both
In this post, I will provide an overview of generalized additive models (GAMs) and their desirable features. Predictive accuracy has long been an important goal of machine learning. But model interpretability has received more attention in recent years. Stakeholders, such as executives, regulators, and domain experts, often want to understand how and why a model makes its predictions before they trust it enough to use it in practice. However, when you train a machine learning model, you typically face a tradeoff between accuracy and interpretability.
Gaussian Naive Bayes Explained and Hands-On with Scikit-Learn
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.
The Dutch Draw: Constructing a Universal Baseline for Binary Prediction Models
van de Bijl, Etienne, Klein, Jan, Pries, Joris, Bhulai, Sandjai, Hoogendoorn, Mark, van der Mei, Rob
Novel prediction methods should always be compared to a baseline to know how well they perform. Without this frame of reference, the performance score of a model is basically meaningless. What does it mean when a model achieves an $F_1$ of 0.8 on a test set? A proper baseline is needed to evaluate the `goodness' of a performance score. Comparing with the latest state-of-the-art model is usually insightful. However, being state-of-the-art can change rapidly when newer models are developed. Contrary to an advanced model, a simple dummy classifier could be used. However, the latter could be beaten too easily, making the comparison less valuable. This paper presents a universal baseline method for all binary classification models, named the Dutch Draw (DD). This approach weighs simple classifiers and determines the best classifier to use as a baseline. We theoretically derive the DD baseline for many commonly used evaluation measures and show that in most situations it reduces to (almost) always predicting either zero or one. Summarizing, the DD baseline is: (1) general, as it is applicable to all binary classification problems; (2) simple, as it is quickly determined without training or parameter-tuning; (3) informative, as insightful conclusions can be drawn from the results. The DD baseline serves two purposes. First, to enable comparisons across research papers by this robust and universal baseline. Secondly, to provide a sanity check during the development process of a prediction model. It is a major warning sign when a model is outperformed by the DD baseline.
Satellite Monitoring of Terrestrial Plastic Waste
Kruse, Caleb, Boyda, Edward, Chen, Sully, Karra, Krishna, Bou-Nahra, Tristan, Hammer, Dan, Mathis, Jennifer, Maddalene, Taylor, Jambeck, Jenna, Laurier, Fabien
Plastic waste is a significant environmental pollutant that is difficult to monitor. We created a system of neural networks to analyze spectral, spatial, and temporal components of Sentinel-2 satellite data to identify terrestrial aggregations of waste. The system works at continental scale. We evaluated performance in Indonesia and detected 374 waste aggregations, more than double the number of sites found in public databases. The same system deployed across twelve countries in Southeast Asia identifies 996 subsequently confirmed waste sites. For each detected site, we algorithmically monitor waste site footprints through time and cross-reference other datasets to generate physical and social metadata. 19% of detected waste sites are located within 200 m of a waterway. Numerous sites sit directly on riverbanks, with high risk of ocean leakage.
Addressing Missing Sources with Adversarial Support-Matching
Kehrenberg, Thomas, Bartlett, Myles, Sharmanska, Viktoriia, Quadrianto, Novi
When trained on diverse labeled data, machine learning models have proven themselves to be a powerful tool in all facets of society. However, due to budget limitations, deliberate or non-deliberate censorship, and other problems during data collection and curation, the labeled training set might exhibit a systematic shortage of data for certain groups. We investigate a scenario in which the absence of certain data is linked to the second level of a two-level hierarchy in the data. Inspired by the idea of protected groups from algorithmic fairness, we refer to the partitions carved by this second level as "subgroups"; we refer to combinations of subgroups and classes, or leaves of the hierarchy, as "sources". To characterize the problem, we introduce the concept of classes with incomplete subgroup support. The representational bias in the training set can give rise to spurious correlations between the classes and the subgroups which render standard classification models ungeneralizable to unseen sources. To overcome this bias, we make use of an additional, diverse but unlabeled dataset, called the "deployment set", to learn a representation that is invariant to subgroup. This is done by adversarially matching the support of the training and deployment sets in representation space. In order to learn the desired invariance, it is paramount that the sets of samples observed by the discriminator are balanced by class; this is easily achieved for the training set, but requires using semi-supervised clustering for the deployment set. We demonstrate the effectiveness of our method with experiments on several datasets and variants of the problem.
All About Logistic Regression
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Logistic Regression is a Supervised Machine Learning algorithm that is used in classification problems where we have to distinguish the dependent variable between two or more categories or classes by using the independent variables.