Performance Analysis
Machine Learning Approaches for Non-Intrusive Home Absence Detection Based on Appliance Electrical Use
Lentzas, Athanasios, Vrakas, Dimitris
Home absence detection is an emerging field on smart home installations. Identifying whether or not the residents of the house are present, is important in numerous scenarios. Possible scenarios include but are not limited to: elderly people living alone, people suffering from dementia, home quarantine. The majority of published papers focus on either pressure / door sensors or cameras in order to detect outing events. Although the aforementioned approaches provide solid results, they are intrusive and require modifications for sensor placement. In our work, appliance electrical use is investigated as a means for detecting the presence or absence of residents. The energy use is the result of power disaggregation, a non intrusive / non invasive sensing method. Since a dataset providing energy data and ground truth for home absence is not available, artificial outing events were introduced on the UK-DALE dataset, a well known dataset for Non Intrusive Load Monitoring (NILM). Several machine learning algorithms were evaluated using the generated dataset. Benchmark results have shown that home absence detection using appliance power consumption is feasible.
How to Build a Complete Classification Model in R and caret
R is a programming language used mainly in statistics, but it also provides valid libraries for Machine Learning. In this tutorial, I describe how to implement a classification task using the caret package provided by R. The objective of this example is to predict heart attacks through a K-Neighbors Classifier. The example uses the hearts dataset, available on Kaggle under the CC0 Public Domain license. In my previous articles, I have already analyzed this dataset in Python both using scikit-learn and pycaret. In this article, I try to solve the same problem in R. As input features, I consider all the columns but the last one is named output, which I consider as target class.
Comparing Model Evaluation Techniques Part 2: Classification and Clustering - DataScienceCentral.com
In part 1, I compared a few model evaluation techniques that fall under the umbrella of'general statistical tools and tests'. Here in Part 2 I compare three of the more popular model evaluation techniques for classification and clustering: confusion matrix, gain and lift chart, and ROC curve. That said, you'll want to choose a method that gives you the answers you need for the particular field you're in. For example, while a confusion matrix can be a great tool for comparing models, it isn't much good for marketing decisions (where the gain and lift chart would be a better choice). Other less popular (but still valid) tools include the K-S chart and Gini Coefficient.
Prognosis of Rotor Parts Fly-off Based on Cascade Classification and Online Prediction Ability Index
Shen, Yingjun, Song, Zhe, Kusiak, Andrew
Large rotating machines, e.g., compressors, steam turbines, gas turbines, are critical equipment in many process industries such as energy, chemical, and power generation. Due to high rotating speed and tremendous momentum of the rotor, the centrifugal force may lead to flying apart of the rotor parts, which brings a great threat to the operation safety. Early detection and prediction of potential failures could prevent the catastrophic plant downtime and economic loss. In this paper, we divide the operational states of a rotating machine into normal, risky, and high-risk ones based on the time to the moment of failure. Then a cascade classifying algorithm is proposed to predict the states in two steps, first we judge whether the machine is in normal or abnormal condition; for time periods which are predicted as abnormal we further classify them into risky or high-risk states. Moreover, traditional classification model evaluation metrics, such as confusion matrix, true-false accuracy, are static and neglect the online prediction dynamics and uneven wrong-prediction prices. An Online Prediction Ability Index (OPAI) is proposed to select prediction models with consistent online predictions and smaller close-to-downtime prediction errors. Real-world data sets and computational experiments are used to verify the effectiveness of proposed methods.
A technique for making quantum computing more resilient to noise, which boosts performance
Quantum computing continues to advance at a rapid pace, but one challenge that holds the field back is mitigating the noise that plagues quantum machines. This leads to much higher error rates compared to classical computers. This noise is often caused by imperfect control signals, interference from the environment, and unwanted interactions between qubits, which are the building blocks of a quantum computer. Performing computations on a quantum computer involves a "quantum circuit," which is a series of operations called quantum gates. These quantum gates, which are mapped to the individual qubits, change the quantum states of certain qubits, which then perform the calculations to solve a problem.
Why Precision and Recall metric ?
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Why 90's % accuracy cannot decide the wellness of your Machine Learning Model?
Accuracy versus interpretability? With generalized additive models (GAMs), you can have both
In this post, I will provide an overview of generalized additive models (GAMs) and their desirable features. Predictive accuracy has long been an important goal of machine learning. But model interpretability has received more attention in recent years. Stakeholders, such as executives, regulators, and domain experts, often want to understand how and why a model makes its predictions before they trust it enough to use it in practice. However, when you train a machine learning model, you typically face a tradeoff between accuracy and interpretability.
Gaussian Naive Bayes Explained and Hands-On with Scikit-Learn
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.
The Dutch Draw: Constructing a Universal Baseline for Binary Prediction Models
van de Bijl, Etienne, Klein, Jan, Pries, Joris, Bhulai, Sandjai, Hoogendoorn, Mark, van der Mei, Rob
Novel prediction methods should always be compared to a baseline to know how well they perform. Without this frame of reference, the performance score of a model is basically meaningless. What does it mean when a model achieves an $F_1$ of 0.8 on a test set? A proper baseline is needed to evaluate the `goodness' of a performance score. Comparing with the latest state-of-the-art model is usually insightful. However, being state-of-the-art can change rapidly when newer models are developed. Contrary to an advanced model, a simple dummy classifier could be used. However, the latter could be beaten too easily, making the comparison less valuable. This paper presents a universal baseline method for all binary classification models, named the Dutch Draw (DD). This approach weighs simple classifiers and determines the best classifier to use as a baseline. We theoretically derive the DD baseline for many commonly used evaluation measures and show that in most situations it reduces to (almost) always predicting either zero or one. Summarizing, the DD baseline is: (1) general, as it is applicable to all binary classification problems; (2) simple, as it is quickly determined without training or parameter-tuning; (3) informative, as insightful conclusions can be drawn from the results. The DD baseline serves two purposes. First, to enable comparisons across research papers by this robust and universal baseline. Secondly, to provide a sanity check during the development process of a prediction model. It is a major warning sign when a model is outperformed by the DD baseline.
Satellite Monitoring of Terrestrial Plastic Waste
Kruse, Caleb, Boyda, Edward, Chen, Sully, Karra, Krishna, Bou-Nahra, Tristan, Hammer, Dan, Mathis, Jennifer, Maddalene, Taylor, Jambeck, Jenna, Laurier, Fabien
Plastic waste is a significant environmental pollutant that is difficult to monitor. We created a system of neural networks to analyze spectral, spatial, and temporal components of Sentinel-2 satellite data to identify terrestrial aggregations of waste. The system works at continental scale. We evaluated performance in Indonesia and detected 374 waste aggregations, more than double the number of sites found in public databases. The same system deployed across twelve countries in Southeast Asia identifies 996 subsequently confirmed waste sites. For each detected site, we algorithmically monitor waste site footprints through time and cross-reference other datasets to generate physical and social metadata. 19% of detected waste sites are located within 200 m of a waterway. Numerous sites sit directly on riverbanks, with high risk of ocean leakage.