concise technical overview
Frequent Pattern Mining and the Apriori Algorithm: A Concise Technical Overview
These are all related, yet distinct, concepts that have been used for a very long time to describe an aspect of data mining that many would argue is the very essence of the term data mining: taking a set of data and applying statistical methods to find interesting and previously-unknown patterns within said set of data. We aren't looking to classify instances or perform instance clustering; we simply want to learn patterns of subsets which emerge within a dataset and across instances, which ones emerge frequently, which items are associated, and which items correlate with others. It's easy to see why the above terms become conflated. So, let's have a look at this essential aspect of data mining. Foregoing the Apriori algorithm for now, I will simply use the term frequent pattern mining to refer to the big tent of concepts outlined above, even if somewhat flawed (and even if I personally prefer the less often used term association mining).
Logistic Regression: A Concise Technical Overview
A popular statistical technique to predict binomial outcomes (y 0 or 1) is Logistic Regression. Logistic regression predicts categorical outcomes (binomial / multinomial values of y), whereas linear Regression is good for predicting continuous-valued outcomes (such as weight of a person in kg, the amount of rainfall in cm). The predictions of Logistic Regression (henceforth, LogR in this article) are in the form of probabilities of an event occurring, ie the probability of y 1, given certain values of input variables x. As shown in Figure1, the logit function on the right- with a range of - to, is the inverse of the logistic function shown on the left- with a range of 0 to 1. Estimating the values of B0,B1,..,Bk involves the concepts of probability, odds and log odds. The example dataset here is sourced from the UCLA website. The task is to predict which students graduated with honours or not (y 1 or 0), for 200 students with fields female, read, write, math, hon, femalexmath .
XGBoost: A Concise Technical Overview
"Our single XGBoost model can get to the top three! Our final model just averaged XGBoost models with different random seeds." With entire blogs dedicated to how the sole application of XGBoost can propel one's ranking on Kaggle competitions, it is time we delved deeper into the concepts of XGBoost. Bagging algorithms control for high variance in a model. However, boosting algorithms are considered more effective as they deal with both bias as well as variance (the bias-variance trade-off).
Machine Learning Algorithms: A Concise Technical Overview โ Part 1
Whether you are a newcomer to machine learning, a newbie to specific algorithms or concepts, or a seasoned ML vet looking for a once-over of an algorithm you haven't seen or used in a while, these short and to-the-point tutorials may provide the assistance you are looking for. Each of these posts concisely covers a single, specific machine learning concept. Support Vector Machines (SVMs) are a particular classification strategy. SMVs work by transforming the training dataset into a higher dimension, which is then inspected for the optimal separation boundary, or boundaries, between classes. In SVMs, these boundaries are referred to as hyperplanes, which are identified by locating support vectors, or the instances that most essentially define classes, and their margins, which are the lines parallel to the hyperplane defined by the shortest distance between a hyperplane and its support vectors.
Great Collection of Minimal and Clean Implementations of Machine Learning Algorithms
Want to implement machine learning algorithms from scratch? A recent KDnuggets poll asked "Which methods/algorithms you used in the past 12 months for an actual Data Science-related application?" with results found here. The results are analyzed by industry employment sector and region, but the main take away for the uninitiated is that there are a wide array of algorithms covered. And let's be clear: this is not a complete representation of available machine learning algorithms, but rather a subset of the most-used algorithms (as per our readers). There are lots of machine learning algorithms in existence today.
Support Vector Machines: A Concise Technical Overview
Classification is concerned with building a model that separates data into distinct classes. This model is built by inputting a set of training data for which the classes are pre-labeled in order for the algorithm to learn from. The model is then used by inputting a different dataset for which the classes are withheld, allowing the model to predict their class membership based on what it has learned from the training set. Well-known classification schemes include decision trees and Support Vector Machines, among a whole host of others. As this type of algorithm requires explicit class labeling, classification is a form of supervised learning.
Machine Learning Algorithms: A Concise Technical Overview
Whether you are a newcomer to machine learning, a newbie to specific algorithms or concepts, or a seasoned ML vet looking for a once-over of an algorithm you haven't seen or used in a while, these short and to-the-point tutorials may provide the assistance you are looking for. Each of these posts concisely covers a single, specific machine learning concept. Support Vector Machines remain a popular and time-tested classification algorithm. This post provides a high-level concise technical overview of their functionality. A wide array of clustering techniques are in use today.
Linear Regression, Least Squares & Matrix Multiplication: A Concise Technical Overview
Regression is a time-tested manner for approximating relationships among a given collection of data, and the recipient of unhelpful naming via unfortunate circumstances. Linear regression is a simple algebraic tool which attempts to find the "best" (generally straight) line fitting 2 or more attributes, with one attribute (simple linear regression), or a combination of several (multiple linear regression), being used to predict another, the class attribute. A set of training instances is used to compute the linear model, with one attribute, or a set of attributes, being plotted against another. The model then attempts to identify where new instances would lie on the regression line, given a particular class attribute. It is often confusing for people without a sufficient math background to understand how matrix multiplication fits into linear regression.
Great Collection of Minimal and Clean Implementations of Machine Learning Algorithms
Want to implement machine learning algorithms from scratch? A recent KDnuggets poll asked "Which methods/algorithms you used in the past 12 months for an actual Data Science-related application?" with results found here. The results are analyzed by industry employment sector and region, but the main take away for the uninitiated is that there are a wide array of algorithms covered. And let's be clear: this is not a complete representation of available machine learning algorithms, but rather a subset of the most-used algorithms (as per our readers). There are lots of machine learning algorithms in existence today.