Performance Analysis


Building a machine learning classifier model for diabetes

#artificialintelligence

The Pima Indians of Arizona and Mexico have the highest reported prevalence of diabetes of any population in the world. A small study has been conducted to analyse their medical records to assess if it is possible to predict the onset of diabetes based on diagnostic measures. The dataset is downloaded from Kaggle, where all patients included are females at least 21 years old of Pima Indian heritage. The objective of this project is to build a predictive machine learning model to predict based on diagnostic measurements whether a patient has diabetes. This is a binary (2-class) classification project with supervised learning.


Developing a business strategy by combining machine learning with sensitivity analysis Amazon Web Services

#artificialintelligence

Machine learning (ML) is routinely used by countless businesses to assist with decision making. In most cases, however, the predictions and business decisions made by ML systems still require the intuition of human users to make judgment calls. In this post, I show how to combine ML with sensitivity analysis to develop a data-driven business strategy. This post focuses on customer churn (that is, the defection of customers to competitors), while covering problems that often arise when using ML-based analysis. These problems include difficulties with handling incomplete and unbalanced data, deriving strategic options, and quantitatively evaluating the potential impact of those options.


Developing a business strategy by combining machine learning with sensitivity analysis Amazon Web Services

#artificialintelligence

Machine learning (ML) is routinely used by countless businesses to assist with decision making. In most cases, however, the predictions and business decisions made by ML systems still require the intuition of human users to make judgment calls. In this post, I show how to combine ML with sensitivity analysis to develop a data-driven business strategy. This post focuses on customer churn (that is, the defection of customers to competitors), while covering problems that often arise when using ML-based analysis. These problems include difficulties with handling incomplete and unbalanced data, deriving strategic options, and quantitatively evaluating the potential impact of those options.


When Cross-Validation is More Powerful than Regularization

#artificialintelligence

Regularization is a way of avoiding overfit by restricting the magnitude of model coefficients (or in deep learning, node weights). A simple example of regularization is the use of ridge or lasso regression to fit linear models in the presence of collinear variables or (quasi-)separation. The intuition is that smaller coefficients are less sensitive to idiosyncracies in the training data, and hence, less likely to overfit. Cross-validation is a way to safely reuse training data in nested model situations. This includes both the case of setting hyperparameters before fitting a model, and the case of fitting models (let's call them base learners) that are then used as variables in downstream models, as shown in Figure 1.


Resampling Methods: Bootstrap vs jackknife

#artificialintelligence

Resampling is a way to reuse data to generate new, hypothetical samples (called resamples) that are representative of an underlying population. Two popular tools are the bootstrap and jackknife. Although they have many similarities (e.g. they both can estimate precision for an estimator θ), they do have a few notable differences. Bootstrapping is the most popular resampling method today. It uses sampling with replacement to estimate the sampling distribution for a desired estimator.


The 5 Classification Evaluation metrics every Data Scientist must know

#artificialintelligence

What do we want to optimize for? Most of the businesses fail to answer this simple question. Every business problem is a little different, and it should be optimized differently. We all have created classification models. A lot of time we try to increase evaluate our models on accuracy.


The 6 Metrics You Need to Optimize for Performance in Machine Learning - Exxact

#artificialintelligence

There are many metrics to measure the performance of your model depending on the type of machine learning you are looking to conduct. In this article, we take a look at performance measures for classification and regression models and discuss which is better optimized. Sometimes the metric to look at will vary according to the problem that is initially being solved. The True Positive Rate also called Recall is the go-to performance measure in binary/non-binary classification problems. Most if not all the time, we are only interested in correctly predicting one class.


The 6 Metrics You Need to Optimize for Performance in Machine Learning

#artificialintelligence

There are many metrics to measure the performance of your model depending on the type of machine learning you are looking to conduct. In this article, we take a look at performance measures for classification and regression models and discuss which is better optimized. Sometimes the metric to look at will vary according to the problem that is initially being solved. The True Positive Rate also called Recall is the go-to performance measure in binary/non-binary classification problems. Most if not all the time, we are only interested in correctly predicting one class.


How to make algorithms fairer

#artificialintelligence

Fixing algorithms may not be the best response to bias. Ethicist Tom Douglas offers a more radical approach to creating fairness, that aims for'substantive' rather than'procedural' fairness outside of design. Our lives are increasingly affected by algorithms. People may be denied loans, jobs, insurance policies, or even parole on the basis of risk scores that they produce. Yet algorithms are notoriously prone to biases.


Detecting random filenames using (un)supervised machine learning

#artificialintelligence

Combining both n-grams and random forest models to detect malicious activity. An essential part of Managed Detection and Response at Fox-IT is the Security Operations Center. This is our frontline for detecting and analyzing possible threats. Our Security Operations Center brings together the best in human and machine analysis and we continually strive to improve both. For instance, we develop machine learning techniques for detecting malicious content such as DGA domains or unusual SMB traffic.