Goto

Collaborating Authors

 Nearest Neighbor Methods


A Comparative Study of Classification Techniques in Data Mining Algorithms

@machinelearnbot

Classification is used to find out in which group each data instance is related within a given dataset. It is used for classifying data into different classes according to some constrains. Several major kinds of classification algorithms including C4.5, ID3, k-nearest neighbor classifier, Naive Bayes, SVM, and ANN are used for classification. Generally a classification technique follows three approaches Statistical, Machine Learning and Neural Network for classification. While considering these approaches this paper provides an inclusive survey of different classification algorithms and their features and limitations.


A Beginner's Guide to Machine Learning (in Python)

@machinelearnbot

In this course, you will learn the basics of Machine Learning and Data Mining; almost everything you need to get started. You will understand what Big Data is and what Data Science and Data Analytics is. You will learn algorithms such as Linear Regression, Logistic Regression, Support Vector Machine, K-Nearest Neighbor, Decision Trees, and Neural Networks. You'll also understand how to combine algorithms into ensembles. Preprocessing data will be taught and you will understand how to clean your data, transform it, how to handle categorical features, and how to handle unbalanced data.


Machine Learning Classification Algorithms using MATLAB

#artificialintelligence

This course is for you If you are being fascinated by the field of Machine Learning? This course is designed to cover one of the most interesting areas of machine learning called classification. I will take you step-by-step in this course and will first cover the basics of MATLAB. Following that we will look into the details of how to use different machine learning algorithms using MATLAB. Specifically, we will be looking at the MATLAB toolbox called statistic and machine learning toolbox.We will implement some of the most commonly used classification algorithms such as K-Nearest Neighbor, Naive Bayes, Discriminant Analysis, Decision Tress, Support Vector Machines, Error Correcting Ouput Codes and Ensembles.


Data Science: Supervised Machine Learning in Python

@machinelearnbot

In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.


Examining the Use of Neural Networks for Feature Extraction: A Comparative Analysis using Deep Learning, Support Vector Machines, and K-Nearest Neighbor Classifiers

arXiv.org Machine Learning

Neural networks in many varieties are touted as very powerful machine learning tools because of their ability to distill large amounts of information from different forms of data, extracting complex features and enabling powerful classification abilities. In this study, we use neural networks to extract features from both images and numeric data and use these extracted features as inputs for other machine learning models, namely support vector machines (SVMs) and k-nearest neighbor classifiers (KNNs), in order to see if neural-network-extracted features enhance the capabilities of these models. We tested 7 different neural network architectures in this manner, 4 for images and 3 for numeric data, training each for varying lengths of time and then comparing the results of the neural network independently to those of an SVM and KNN on the data, and finally comparing these results to models of SVM and KNN trained using features extracted via the neural network architecture. This process was repeated on 3 different image datasets and 2 different numeric datasets. The results show that, in many cases, the features extracted using the neural network significantly improve the capabilities of SVMs and KNNs compared to running these algorithms on the raw features, and in some cases also surpass the performance of the neural network alone. This in turn suggests that it may be a reasonable practice to use neural networks as a means to extract features for classification by other machine learning models for some datasets.


Modeling Dengue Vector Population Using Remotely Sensed Data and Machine Learning

arXiv.org Machine Learning

Mosquitoes are vectors of many human diseases. In particular, Aedes \ae gypti (Linnaeus) is the main vector for Chikungunya, Dengue, and Zika viruses in Latin America and it represents a global threat. Public health policies that aim at combating this vector require dependable and timely information, which is usually expensive to obtain with field campaigns. For this reason, several efforts have been done to use remote sensing due to its reduced cost. The present work includes the temporal modeling of the oviposition activity (measured weekly on 50 ovitraps in a north Argentinean city) of Aedes \ae gypti (Linnaeus), based on time series of data extracted from operational earth observation satellite images. We use are NDVI, NDWI, LST night, LST day and TRMM-GPM rain from 2012 to 2016 as predictive variables. In contrast to previous works which use linear models, we employ Machine Learning techniques using completely accessible open source toolkits. These models have the advantages of being non-parametric and capable of describing nonlinear relationships between variables. Specifically, in addition to two linear approaches, we assess a Support Vector Machine, an Artificial Neural Networks, a K-nearest neighbors and a Decision Tree Regressor. Considerations are made on parameter tuning and the validation and training approach. The results are compared to linear models used in previous works with similar data sets for generating temporal predictive models. These new tools perform better than linear approaches, in particular Nearest Neighbor Regression (KNNR) performs the best. These results provide better alternatives to be implemented operatively on the Argentine geospatial Risk system that is running since 2012.


k-Nearest Neighbors by Means of Sequence to Sequence Deep Neural Networks and Memory Networks

arXiv.org Machine Learning

k-Nearest Neighbors is one of the most fundamental but effective classification models. In this paper, we propose two families of models built on a sequence to sequence model and a memory network model to mimic the k-Nearest Neighbors model, which generate a sequence of labels, a sequence of out-of-sample feature vectors and a final label for classification, and thus they could also function as oversamplers. We also propose 'out-of-core' versions of our models which assume that only a small portion of data can be loaded into memory. Computational experiments show that our models outperform k-Nearest Neighbors, a feed-forward neural network and a memory network, due to the fact that our models must produce additional output and not just the label. As an oversample on imbalanced datasets, the sequence to sequence kNN model often outperforms Synthetic Minority Over-sampling Technique and Adaptive Synthetic Sampling.


Dynamic Ensemble Selection VS K-NN: why and when Dynamic Selection obtains higher classification performance?

arXiv.org Artificial Intelligence

Multiple classifier systems focus on the combination of classifiers to obtain better performance than a single robust one. These systems unfold three major phases: pool generation, selection and integration. One of the most promising MCS approaches is Dynamic Selection (DS), which relies on finding the most competent classifier or ensemble of classifiers to predict each test sample. The majority of the DS techniques are based on the K-Nearest Neighbors (K-NN) definition, and the quality of the neighborhood has a huge impact on the performance of DS methods. In this paper, we perform an analysis comparing the classification results of DS techniques and the K-NN classifier under different conditions. Experiments are performed on 18 state-of-the-art DS techniques over 30 classification datasets and results show that DS methods present a significant boost in classification accuracy even though they use the same neighborhood as the K-NN. The reasons behind the outperformance of DS techniques over the K-NN classifier reside in the fact that DS techniques can deal with samples with a high degree of instance hardness (samples that are located close to the decision border) as opposed to the K-NN. In this paper, not only we explain why DS techniques achieve higher classification performance than the K-NN but also when DS should be used.



Machine Learning - The Hitchhiker's Guide to Python

#artificialintelligence

Machine learning is undoubtedly on the rise, slowly climbing into'buzzword' territory. This is in large part due to misuse and simple misunderstanding of the topics that come with the term. Take a quick glance at the chart below and you'll see this illustrated quite clearly thanks to Google Trends' analysis of interest in the term over the last few years. However, the goal of this article is not to simply reflect on the popularity of machine learning. It is rather to explain and implement relevant machine learning algorithms in a clear and concise way.