Statistical Learning


Python: Implementing a k-means algorithm with sklearn

#artificialintelligence

Originally posted by Michael Grogan. The below is an example of how sklearn in Python can be used to develop a k-means clustering algorithm. The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters in order to aid in analysis of the data. From this perspective, it has particular value from a data visualisation perspective. The particular example used here is that of stock returns.


New Books and Resources About K-Nearest Neighbors Algorithms

#artificialintelligence

Out of all the machine learning algorithms I have come across, KNN has easily been the simplest to pick up. Despite it's simplicity, it has proven to be incredibly effective at certain tasks (as you will see in this article). It can be used for both classification and regression problems! It's far more popularly used for classification problems, however. I have seldom seen KNN being implemented on any regression task.


Getting Started with TensorFlow and Keras – Maker.io Digi-Key Electronics

#artificialintelligence

In this tutorial, we show you how to configure TensorFlow with Keras on a computer and build a simple linear regression model. If you have access to a modern NVIDIA graphics card (GPU), you can enable tensorflow-gpu to take advantage of the parallel processing afforded by CUDA. The field of Artificial Intelligence (AI) has been around for quite some time. As we move to build an understanding and use cases for Edge AI, we first need to understand some of the popular frameworks for building machine learning models on personal computers (and servers!). These models can then be deployed to edge devices, such as single-board computers (like the Raspberry Pi) and microcontrollers.


December 2019: "Top 40" New R Packages

#artificialintelligence

One hundred fifty-two packages made it to CRAN in December. Here are my "Top 40" picks in ten categories: Data, Genomics, Machine Learning, Mathematics, Medicine, Science, Statistics, Time Series, Utilities, and Visualization. Look here for more information as well as the vignette. Loads and creates spatial data, including layers and tools that are relevant to the activities of the Commission for the Conservation of Antarctic Marine Living Resources ( CCAMLR). Have a look at the vignette.


Robust Linear Regression Models for Nonlinear, Heteroscedastic Data

#artificialintelligence

This is where one needs to be careful. Our instinct might be to simply exponentiate the log-scale predictions back to raw-scale y. But our instinct would be wrong. Let's see why that is. If you like, you can skip the little bit of math that follows and scroll down to the section called Duan's smearing estimator.


Fraud detection: the problem, solutions and tools

#artificialintelligence

"Fraud is a billion-dollar business There are many formal definitions but essentially a fraud is an "art" and crime of deceiving and scamming people in their financial transactions. Frauds have always existed throughout human history but in this age of digital technology, the strategy, extent and magnitude of financial frauds is becoming wide-ranging -- from credit cards transactions to health benefits to insurance claims. Fraudsters are also getting super creative. Who's never received an email from a Nigerian royal widow that she's looking for trusted someone to hand over large sums of her inheritance? No wonder why is fraud a big deal.


Deep learning vs. machine learning: Understand the differences

#artificialintelligence

Machine learning and deep learning are both forms of artificial intelligence. You can also say, correctly, that deep learning is a specific kind of machine learning. Both machine learning and deep learning start with training and test data and a model and go through an optimization process to find the weights that make the model best fit the data. Both can handle numeric (regression) and non-numeric (classification) problems, although there are several application areas, such as object recognition and language translation, where deep learning models tend to produce better fits than machine learning models. Machine learning algorithms are often divided into supervised (the training data are tagged with the answers) and unsupervised (any labels that may exist are not shown to the training algorithm).


How do We Quantify the Quality of Our Predictions? Part I

#artificialintelligence

We have all worked on different kinds of Machine learning models, and each model needs to be evaluated in different ways. From the initial data that is provided to the outcome and the way, we as the users want to use it. A classification model would require a different metric for model evaluation as compared to a regression model or a Neural Net, and it's important to know and understand which metric to use and when. Here in this series, we go through some of these metrics, starting from the basic and the most commonly used ones to the application-specific and complex metrics that we can use. We will be starting with the basic metrics from sklearn and progress towards the more complicated metrics after that.


Time series modeling with Facebook Prophet

#artificialintelligence

When trying to understand time series, there's so much to think about. Is it affected by seasonality? What kind of model should I use, and how well will it perform? All these questions can make time series modeling kind of intimidating, but it doesn't have to be that bad. While working on a project for my data science bootcamp recently, I tried Facebook Prophet, an open-source package for time series modeling developed by … y'know, Facebook.


Understanding K-Means Clustering using Python the easy way

#artificialintelligence

In the previous article, we studied the k-NN. One thing that I believe is that if we can correlate anything with us or our lives, there are greater chances of understanding the concept. So I will try to explain everything by relating it to humans. It tries to make the inter-cluster data points as similar as possible while also keeping the clusters as different or as far as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster's centroid is at the minimum.