Statistical Learning


Machine Learning Series: Logistic Regression Algorithm in Python

#artificialintelligence

This first video in the logistic regression series introduces this powerful classification algorithm. The logistic regression algorithm is used when the dependent variable or target variable is categorical. Simple Logistic Regression and Multinomial Logistic Regression are explained. This second video in the logistic regression series compares logistic regression with linear regression in terms of their purpose, use cases, equations, error minimizations, and assumptions. This third video in the logistic regression series covers the four ways of preprocessing data before performing logistic regression: missing data handling, categorical data handling, splitting into train and test set, and feature scaling.


When Machine Learning Solutions Are Not Possible!

#artificialintelligence

There is a widespread belief among most of the practitioners that Machine Learning (ML) solutions always lead to business improvement. Although ML-based approaches have brought unique capabilities to the businesses, there are some circumstances under which relying on ML solutions might have a negative impact, or even it might not be possible at all. The main objective of this article is to discuss different use cases in which employing ML does not fully address the targeted business problem. This article presents five scenarios and later introduces possible solutions to consider better solutions for each scenario. The most straightforward reason not to use ML solutions is the inadequate quantity of data which hinders training accurate models.


5 types of machine learning algorithms you should know

#artificialintelligence

First, and arguably the most popular type of machine learning algorithm, is linear regression. Linear regression algorithms map simple correlations between two variables in a set of data. A set of inputs and their corresponding outputs are examined and quantified to show a relationship, including how a change in one variable affects the other. Linear regressions are plotted via a line on a graph. Linear regression's popularity is due to its simplicity: The algorithm is easily explainable, relatively transparent and requires little to no parameter tuning.


Weights & Biases - Part I: Best Practices for Picking a Machine Learning Model

#artificialintelligence

The number of shiny models out there can be overwhelming, which means a lot of times people fallback on a few they trust the most, and use them on all new problems. This can lead to sub-optimal results. Today we're going to learn how to quickly and efficiently narrow down the space of available models to find those that are most likely to perform best on your problem type. We'll also see how we can keep track of our models' performances using Weights and Biases and compare them. Unlike Lord of the Rings, in machine learning there is no one ring (model) to rule them all.


Introduction to Machine Learning Algorithms

#artificialintelligence

Continuing off a basic introduction to AI & Machine Learning, we will explore common algorithms of Machine Learning such as linear regression or classification and defining how they work and when to use them. Afterwards, we will go through an example on using Watson Studio and AutoAI, which automates the workload of data scientists, to best choose the correct machine learning algorithm for your problem.


Manual Feature Engineering

#artificialintelligence

There is also a complementary Domino project available. Many data scientists deliver value to their organizations by mapping, developing, and deploying an appropriate ML solution to address a business problem. Feature engineering is useful for data scientists when assessing tradeoff decisions regarding the impact of their ML models. It is a framework for approaching ML as well as providing techniques for extracting features from raw data that can be used within the models. As Domino seeks to help data scientists accelerate their work, we reached out to AWP Pearson for permission to excerpt the chapter "Manual Feature Engineering: Manipulating Data for Fun and Profit" from the book, Machine Learning with Python for Everyone by Mark E. Fenner. Many thanks to AWP Pearson for providing the permissions to excerpt the work and enabling us to provide a complementary publicly viewable Domino project. We are going to turn our attention away from expanding our catalog of models [as mentioned previously in the book] and instead take a closer look at the data. Feature engineering refers to manipulation--addition, deletion, combination, mutation--of the features. Remember that features are attribute- value pairs, so we could add or remove columns from our data table and modify values within columns. Feature engineering can be used in a broad sense and in a narrow sense. I'm going to use it in a broad, inclusive sense and point out some gotchas along the way. Two drivers of feature engineering are (1) background knowledge from the domain of the task and (2) inspection of the data values. The first case includes a doctor's knowledge of important blood pressure thresholds or an accountant's knowledge of tax bracket levels. Another example is the use of body mass index (BMI) by medical providers and insurance companies. While it has limitations, BMI is quickly calculated from body weight and height and serves as a surrogate for a characteristic that is very hard to accurately measure: proportion of lean body mass. Inspecting the values of a feature means looking at a histogram of its distribution. For distribution-based feature engineering, we might see multimodal distributions--histograms with multiple humps--and decide to break the humps into bins.


101 Machine Learning Algorithms for Data Science with Cheat Sheets

#artificialintelligence

The algorithms have been sorted into 9 groups: Anomaly Detection, Association Rule Learning, Classification, Clustering, Dimensional Reduction, Ensemble, Neural Networks, Regression, Regularization. In this post, you'll find 101 machine learning algorithms, including useful infographics to help you know when to use each one (if available). Each of the accordian drop downs are embeddable if you want to take them with you. All you have to do is click the little'embed' button in the lower left hand corner and copy/paste the iframe. All we ask is you link back to this post.


Daily Digest September 16, 2019 – BioDecoded

#artificialintelligence

Reseachers benchmarked 22 classification methods that automatically assign cell identities including single-cell-specific and general-purpose classifiers. The performance of the methods is evaluated using 27 publicly available single-cell RNA sequencing datasets of different sizes, technologies, species, and levels of complexity. The general-purpose support vector machine classifier has overall the best performance across the different experiments. Researchers present a novel algorithm for predicting genetic ancestry using only variables that are routinely captured in electronic health records (EHRs), such as self-reported race and ethnicity, and condition billing codes. Using patients that have both genetic and clinical information at Columbia University / New York-Presbyterian Irving Medical Center, they developed a pipeline that uses only clinical data to predict the genetic ancestry of all patients of which more than 80% identify as other or unknown.


MUSIC CLASSIFICATION USING ARTIFICIAL INTELLIGENCE

#artificialintelligence

Music is the most popular art form that is performed and listened to by billions of people every day. There are many genres of music such as pop, classical, jazz, folk etc. Each genre has different music instruments, tone, rhythm, beats, flow etc. Digital music and online streaming have become very popular these days due to the increase in the number of users. To create a machine learning model, which classifies music samples into different genres.


DataWorkshop Club Conf 2019 Machine Learning Conference Europe

#artificialintelligence

Philippe Esling received a B.Sc in mathematics and computer science in 2007, a M.Sc in acoustics and signal processing in 2009 and a PhD on data mining and machine learning in 2012. He was a post-doctoral fellow in the Department of Genetics and Evolution at the University of Geneva in 2012. He is now an associate professor with tenure at Ircam laboratory and Sorbonne Université since 2013. In this short time span, he authored and co-authored over 20 peer-reviewed journal papers in prestigious journals. He received a young researcher award for his work in audio querying in 2011, a PhD award for his work in multiobjective time series data mining.