Goto

Collaborating Authors

 Nearest Neighbor Methods


Data Science: Supervised Machine Learning in Python

@machinelearnbot

In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.


Examining the Use of Neural Networks for Feature Extraction: A Comparative Analysis using Deep Learning, Support Vector Machines, and K-Nearest Neighbor Classifiers

arXiv.org Machine Learning

Neural networks in many varieties are touted as very powerful machine learning tools because of their ability to distill large amounts of information from different forms of data, extracting complex features and enabling powerful classification abilities. In this study, we use neural networks to extract features from both images and numeric data and use these extracted features as inputs for other machine learning models, namely support vector machines (SVMs) and k-nearest neighbor classifiers (KNNs), in order to see if neural-network-extracted features enhance the capabilities of these models. We tested 7 different neural network architectures in this manner, 4 for images and 3 for numeric data, training each for varying lengths of time and then comparing the results of the neural network independently to those of an SVM and KNN on the data, and finally comparing these results to models of SVM and KNN trained using features extracted via the neural network architecture. This process was repeated on 3 different image datasets and 2 different numeric datasets. The results show that, in many cases, the features extracted using the neural network significantly improve the capabilities of SVMs and KNNs compared to running these algorithms on the raw features, and in some cases also surpass the performance of the neural network alone. This in turn suggests that it may be a reasonable practice to use neural networks as a means to extract features for classification by other machine learning models for some datasets.


Modeling Dengue Vector Population Using Remotely Sensed Data and Machine Learning

arXiv.org Machine Learning

Mosquitoes are vectors of many human diseases. In particular, Aedes \ae gypti (Linnaeus) is the main vector for Chikungunya, Dengue, and Zika viruses in Latin America and it represents a global threat. Public health policies that aim at combating this vector require dependable and timely information, which is usually expensive to obtain with field campaigns. For this reason, several efforts have been done to use remote sensing due to its reduced cost. The present work includes the temporal modeling of the oviposition activity (measured weekly on 50 ovitraps in a north Argentinean city) of Aedes \ae gypti (Linnaeus), based on time series of data extracted from operational earth observation satellite images. We use are NDVI, NDWI, LST night, LST day and TRMM-GPM rain from 2012 to 2016 as predictive variables. In contrast to previous works which use linear models, we employ Machine Learning techniques using completely accessible open source toolkits. These models have the advantages of being non-parametric and capable of describing nonlinear relationships between variables. Specifically, in addition to two linear approaches, we assess a Support Vector Machine, an Artificial Neural Networks, a K-nearest neighbors and a Decision Tree Regressor. Considerations are made on parameter tuning and the validation and training approach. The results are compared to linear models used in previous works with similar data sets for generating temporal predictive models. These new tools perform better than linear approaches, in particular Nearest Neighbor Regression (KNNR) performs the best. These results provide better alternatives to be implemented operatively on the Argentine geospatial Risk system that is running since 2012.


k-Nearest Neighbors by Means of Sequence to Sequence Deep Neural Networks and Memory Networks

arXiv.org Machine Learning

k-Nearest Neighbors is one of the most fundamental but effective classification models. In this paper, we propose two families of models built on a sequence to sequence model and a memory network model to mimic the k-Nearest Neighbors model, which generate a sequence of labels, a sequence of out-of-sample feature vectors and a final label for classification, and thus they could also function as oversamplers. We also propose 'out-of-core' versions of our models which assume that only a small portion of data can be loaded into memory. Computational experiments show that our models outperform k-Nearest Neighbors, a feed-forward neural network and a memory network, due to the fact that our models must produce additional output and not just the label. As an oversample on imbalanced datasets, the sequence to sequence kNN model often outperforms Synthetic Minority Over-sampling Technique and Adaptive Synthetic Sampling.


Boltzmann Encoded Adversarial Machines

arXiv.org Machine Learning

Restricted Boltzmann Machines (RBMs) are a class of generative neural network that are typically trained to maximize a log-likelihood objective function. We argue that likelihood-based training strategies may fail because the objective does not sufficiently penalize models that place a high probability in regions where the training data distribution has low probability. To overcome this problem, we introduce Boltzmann Encoded Adversarial Machines (BEAMs). A BEAM is an RBM trained against an adversary that uses the hidden layer activations of the RBM to discriminate between the training data and the probability distribution generated by the model. We present experiments demonstrating that BEAMs outperform RBMs and GANs on multiple benchmarks.


Dynamic Ensemble Selection VS K-NN: why and when Dynamic Selection obtains higher classification performance?

arXiv.org Artificial Intelligence

Multiple classifier systems focus on the combination of classifiers to obtain better performance than a single robust one. These systems unfold three major phases: pool generation, selection and integration. One of the most promising MCS approaches is Dynamic Selection (DS), which relies on finding the most competent classifier or ensemble of classifiers to predict each test sample. The majority of the DS techniques are based on the K-Nearest Neighbors (K-NN) definition, and the quality of the neighborhood has a huge impact on the performance of DS methods. In this paper, we perform an analysis comparing the classification results of DS techniques and the K-NN classifier under different conditions. Experiments are performed on 18 state-of-the-art DS techniques over 30 classification datasets and results show that DS methods present a significant boost in classification accuracy even though they use the same neighborhood as the K-NN. The reasons behind the outperformance of DS techniques over the K-NN classifier reside in the fact that DS techniques can deal with samples with a high degree of instance hardness (samples that are located close to the decision border) as opposed to the K-NN. In this paper, not only we explain why DS techniques achieve higher classification performance than the K-NN but also when DS should be used.


Machine Learning & Tensorflow - Google Cloud Approach

@machinelearnbot

Then this course is for you! This course has been designed by experts so that we can share our knowledge and help you learn complex theory, algorithms and coding libraries in a simple way. We will walk you step-by-step into the World of Machine Learning. With every tutorial you will develop new skills and improve your understanding of this challenging yet lucrative field of ML. This course is fun and exciting, but at the same time we dive deep into Machine Learning.


Machine Learning - The Hitchhiker's Guide to Python

#artificialintelligence

Machine learning is undoubtedly on the rise, slowly climbing into'buzzword' territory. This is in large part due to misuse and simple misunderstanding of the topics that come with the term. Take a quick glance at the chart below and you'll see this illustrated quite clearly thanks to Google Trends' analysis of interest in the term over the last few years. However, the goal of this article is not to simply reflect on the popularity of machine learning. It is rather to explain and implement relevant machine learning algorithms in a clear and concise way.


Data Science x Project Planning

@machinelearnbot

The intended audience for this short blog post are data science practitioners who seek to implement predictive algorithms in a business-project-based setting, with special focus on presenting the work process flow. We will briefly introduce the k-Nearest Neighbors (k-NN) algorithm, and put more emphasis on the key phases, as opposed to walking through the technical theory behind the algorithm and its prediction performance. The example business project here is a typical sales forecasting problem where we want to accurately predict the quantity sold of a number of products in the future, in order to manage our inventory more wisely. The k-NN algorithm is probably better known for its classifier application, where we use a number of nearby points to determine the outcome of our target. The rationale is straight-forward; if we use height and age as our input, and gender as our target, then it makes sense to say that if a person is at age 25 and 6 feet tall, he is more likely to be male, because 5 other people who are at around the same age and with similar height happen to be male. However, k-NN could also be applied in a non-supervised setting, where we find the similar data points instead.


Scalable attribute-aware network embedding with localily

arXiv.org Artificial Intelligence

Adding attributes for nodes to network embedding helps to improve the ability of the learned joint representation to depict features from topology and attributes simultaneously. Recent research on the joint embedding has exhibited a promising performance on a variety of tasks by jointly embedding the two spaces. However, due to the indispensable requirement of globality based information, present approaches contain a flaw of in-scalability. Here we propose \emph{SANE}, a scalable attribute-aware network embedding algorithm with locality, to learn the joint representation from topology and attributes. By enforcing the alignment of a local linear relationship between each node and its K-nearest neighbors in topology and attribute space, the joint embedding representations are more informative comparing with a single representation from topology or attributes alone. And we argue that the locality in \emph{SANE} is the key to learning the joint representation at scale. By using several real-world networks from diverse domains, We demonstrate the efficacy of \emph{SANE} in performance and scalability aspect. Overall, for performance on label classification, SANE successfully reaches up to the highest F1-score on most datasets, and even closer to the baseline method that needs label information as extra inputs, compared with other state-of-the-art joint representation algorithms. What's more, \emph{SANE} has an up to 71.4\% performance gain compared with the single topology-based algorithm. For scalability, we have demonstrated the linearly time complexity of \emph{SANE}. In addition, we intuitively observe that when the network size scales to 100,000 nodes, the "learning joint embedding" step of \emph{SANE} only takes $\approx10$ seconds.