Goto

Collaborating Authors

Statistical Learning


An Upgraded Marketing Mix Modeling in Python

#artificialintelligence

In my last article, I introduced you to the world of marketing mix modeling. If you have not read it so far, please do before you proceed. There, we have a created a linear regression model that is able to predict sales based on raw advertising spends in several advertising channels, such as TV, radio, web banners. For me as a machine learning practitioner, such a model is nice already on its own. Even better, it also makes business people happy because the model lets us calculate ROIs, allowing us to judge how well each channel performed.


Introduction to Polynomial Regression Analysis

#artificialintelligence

Polynomial regression is one of the machine learning algorithms used for making predictions. For example, polynomial regression is widely applied to predict the spread rate of COVID-19 and other infectious diseases. If you would like to learn more about what polynomial regression analysis is, continue reading. Regression analysis is a helpful statistical tool for studying the correlation between two sets of events, or, statistically speaking, variables ― between a dependent variable and one or more independent variables. For example, your weight loss (dependent variable) depends on the number of hours you spend in the gym (independent variable).


ML-Logistic Regression

#artificialintelligence

There are other optimization algorithms than gradient descent. These algorithms automatically pick the appropriate learning rate alpha, and are usually faster. One way to do this is to do a "One vs all" binary classification. To do this we do a binary classification with a certain class and all the other classes, and select the largest one that has the largest hypothesis output. Since we have 3 classes here, we do the binary classification 3 times.


11 Machine Learning Project Ideas for Beginners

#artificialintelligence

Machine learning is broad and applicable in many fields. So you might get lost trying to find a foothold as a beginner. Nonetheless, taking up projects while learning helps you decipher your interests and focus on a specific path. Additionally, it lets you familiarize yourself with the typical machine learning workflow. Here, we'll show you some of the best beginner project ideas that'll help you dive deeper into the nitty-gritty of machine learning.


Amazon SageMaker tutorial and model

#artificialintelligence

This code pattern describes a way to gain insights by using Watson OpenScale and a SageMaker machine learning model. It explains how to create a logistic regression model using Amazon SageMaker with data from the UC Irvine machine learning database. The pattern uses Watson OpenScale to bind the machine learning model deployed in the AWS cloud, create a subscription, and perform payload and feedback logging. With Watson OpenScale, you can monitor model quality and log payloads, regardless of where the model is hosted. This code pattern uses the example of an Amazon Web Service (AWS) SageMaker model, which demonstrates the independent and open nature of Watson OpenScale.


Improved genetic algorithm and XGBoost classifier for power transformer fault diagnosis

#artificialintelligence

Power transformer is an essential component for the stable and reliable operation of electrical power grid. The traditional diagnostic methods based on dissolved gas analysis (DGA) have been used to identify the power transformer faults. However, the application of these methods is limited due to the low accuracy of fault identification. In this paper, a transformer fault diagnosis system is developed based on the combination of an improved genetic algorithm (IGA) and the XGBoost. In the transformer fault diagnosis system, the improved genetic algorithm is employed to pre-select the input features from the DGA data and optimize the XGBoost classifier. Performance measures such as average unfitness value, likelihood of evolution leap, and likelihood of optimality are used to validate the efficacy of the proposed improved genetic algorithm. The results of simulation experiments show that the improved genetic algorithm can get the optimal solution stably and reliably, and the proposed method improves the average accuracy of transformer fault diagnosis to 99.2\%. Compared to IEC ratios, dual triangle, support vector machine (SVM), and common vector approach (CVA), the diagnostic accuracy of the proposed method is improved by 30.2\%, 47.2\%, 11.2\%, and 3.6\%, respectively. The proposed method can be a potential solution to identify the transformer fault types.


Matrix Profile-based Interpretable Time Series Classifier

#artificialintelligence

Time series classification is a pervasive and transversal problem in various fields ranging from disease diagnosis to anomaly detection in finance. Unfortunately, the most effective models used by Artificial Intelligence systems for time series classification are not interpretable and hide the logic of the decision process, making them unusable in sensitive domains. Recent research is focusing on explanation methods to pair with the obscure classifier to recover this weakness. However, a time series classification approach that is transparent by design, and that is simultaneously efficient and effective is even more preferable. To this aim, we propose an interpretable time series classification method based on the patterns that is possible to extract from the Matrix Profile of the time series in the training set. A smart design of the classification procedure allows obtaining an efficient and effective transparent classifier modeled as a decision tree that expresses the reasons for the classification as the presence of discriminative subsequences. Quantitative and qualitative experimentation shows that the proposed method overcomes state-of-the-art interpretable approaches.


Types of Multi Classification

#artificialintelligence

This blog introduces different types of multi classification systems. Multiclass classifiers can distinguish between more than two classes other than binary classifiers. Stochastic gradient descent (SGD) classifiers, Random Forest classifiers, and naive Bayes classifiers etc. are capable of handling multiple classes natively. On the other hand, Logistic Regression or Support Vector Machine classifiers are strictly binary classifiers. There are various strategies that you can use to perform multiclass classification with multiple binary classifiers.


Why we cannot trust R-Squared?

#artificialintelligence

R-Squared is used to access the prediction performance of machine learning model which is calculated by 1 minus the ratio between SSR (sum of squared regression) and SST (sum of squared total). Where SSR is the sum of squared error between the predicted value and the actual value while SST is the sum of squared error between the average value and the actual value. The ideal case is to have SSR equals zero which will make the value of R-squared become 1, the closer the R-squared is closer to 1 the better. Theoretically, the value of R-Squared will never decrease no matter how many irrelevant feature(s) are added to the model. This is because, even though the feature may be irrelevant, there might still have some random correlation with the target variable that might help the prediction a little bit.


Supervised Learning algorithms cheat-sheet

#artificialintelligence

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used later for mapping new examples. The most popular supervised learning tasks are: Regression and Classification. The result of solving the regression task is a model that can make numerical predictions. The result of solving the classification task is a model that can make classes predictions.