Goto

Collaborating Authors

 Nearest Neighbor Methods


K-Nearest Neighbors (KNN): Solving Classification Problems

#artificialintelligence

In this tutorial, we are going to use the K-Nearest Neighbors (KNN) algorithm to solve a classification problem. Firstly, what exactly do we mean by classification? Classification across a variable means that results are categorised into a particular group. The KNN algorithm is one the most basic, yet most commonly used algorithms for solving classification problems. KNN works by seeking to minimize the distance between the test and training observations, so as to achieve a high classification accuracy.


Using Eigencentrality to Estimate Joint, Conditional and Marginal Probabilities from Mixed-Variable Data: Method and Applications

arXiv.org Machine Learning

Abstract--The ability to estimate joint, conditional and marginal probability distributions over some set of variables is of great utility for many common machine learning tasks. However, estimating these distributions can be challenging, particularly in the case of data containing a mix of discrete and continuous variables. This paper presents a nonparametric method for estimating these distributions directly from a dataset. The data are first represented as a graph consisting of object nodes and attribute value nodes. Depending on the distribution to be estimated, an appropriate eigenvector equation is then constructed. This equation is then solved to find the corresponding stationary distribution of the graph, from which the required distributions can then be estimated and sampled from. The paper demonstrates how the method can be applied to many common machine learning tasks including classification, regression, missing value imputation, outlier detection, random vector generation, and clustering. Being able to estimate joint, conditional and marginal probabilities from some dataset allows a broad range of useful tasks to be performed. For example, classification and regression involve predicting the value of some target variable conditional on the values of the other variables. If we can sample values from the estimated distributions, we could perform random vector generation by generating full random vectors that display the same correlations as the vectors (i.e., data points) in the original data [4], [5]. If we can estimate the joint distribution for the full dataset, then we should also be able to do this for subsets of data, leading to the use of Expectation-Maximization [6] to cluster the data [7]. Taken together, these activities form a large chunk of the tasks commonly used in machine learning.


Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository

arXiv.org Machine Learning

Machine learning qualifies computers to assimilate with data, without being solely programmed [1, 2]. Machine learning can be classified as supervised and unsupervised learning. In supervised learning, computers learn an objective that portrays an input to an output hinged on training input-output pairs [3]. Most efficient and widely used supervised learning algorithms are K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Large Margin Nearest Neighbor (LMNN), and Extended Nearest Neighbor (ENN). The main contribution of this paper is to implement these elegant learning algorithms on eleven different datasets from the UCI machine learning repository to observe the variation of accuracies for each of the algorithms on all datasets. Analyzing the accuracy of the algorithms will give us a brief idea about the relationship of the machine learning algorithms and the data dimensionality. All the algorithms are developed in Matlab. Upon such accuracy observation, the comparison can be built among KNN, SVM, LMNN, and ENN regarding their performances on each dataset.


Explainable time series tweaking via irreversible and reversible temporal transformations

arXiv.org Machine Learning

Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.


Data Science: Supervised Machine Learning in Python

#artificialintelligence

In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.


How Complex is your classification problem? A survey on measuring classification complexity

arXiv.org Machine Learning

Extracting characteristics from the training datasets of classification problems has proven effective in a number of meta-analyses. Among them, measures of classification complexity can estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the existent measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenging characteristics of the problems. This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed and discussed, allowing to prospect opportunities for future work in the area. Finally, descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available.


Missing Data Imputation for Supervised Learning

arXiv.org Machine Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different levels of additional missing-data perturbation. We show imputation methods can increase predictive accuracy in the presence of missing-data perturbation, which can actually improve prediction accuracy by regularizing the classifier. We achieve the state-of-the-art on the Adult dataset with missing-data perturbation and k-nearest-neighbors (k-NN) imputation.


7 Machine Learning Algorithms To Start Learning.... MarkTechPost

#artificialintelligence

It is a simple algorithm which can be used as a performance baseline. This algorithm methodology is used mostly for forecasting and finding out cause and effect relationship between data variables. Its purpose from a database is to read the data points which are separated into several classes and then predict the new sample point classification. It gives great results when used for textual data analysis. It is an unsupervised learning used in unlabelled data sources.


Machine Learning Training Bootcamp : Tonex.Com

#artificialintelligence

Machine Learning training bootcamp is a 3-day specialized training course that covers the essentials of machine learning, a shape and utilization of man-made reasoning (AI). Machine learning computerizes the information investigation process by empowering PCs, machines and IoT to learn and adjust through experience connected to particular undertakings without unequivocal programming. Learning Objectives: Learn about Artificial Intelligence and Machine Learning List similarities and differences between AI, Machine Learning and Data Mining Learn how Artificial Intelligence uses data to offer solutions to existing problems Explore how Machine Learning goes beyond AI to offer data necessary for a machine to learn, adapt and optimize / Clarify how Data Mining can serve as foundation for AI and machine learning to use existing information to highlight patterns List the various applications of machine learning and related algorithms Learn how to classify the types of learning such as supervised and unsupervised learning Implement supervised learning techniques such as linear and logistic regression Use unsupervised learning algorithms including deep learning, clustering and recommender systems (RS) used to help users find new items or services, such as books, music, transportation, people and jobs based on information about the user or the recommended item Learn about classification data and Machine Learning models Select the best algorithms applied to Machine Learning Make accurate predictions and analysis to effectively solve potential problems List Machine Learning concepts, principles, algorithms, tools and applications Learn the concepts and operation of support neural networks, vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means and clustering Comprehend the theoretical concepts and how they relate to the practical aspects of machine learning / Be able to model a wide variety of robust machine learning algorithms including deep learning, clustering and recommendation systems Course Agenda and Topics: The Basics of Machine Learning Machine Learning Techniques, Tools and Algorithms Data and Data Science Review of Terminology and Principles Applied Artificial Intelligence (AI) and Machine Learning Popular Machine Learning Methods Learning Applied to Machine Learning Principal component Analysis Principles of Supervised Machine Learning Algorithms Principles of Unsupervised Machine Learning Regression Applied to Machines Learning Principles of Neural Networks Large Scale Machine Learning Introduction to Deep Learning Applying Machine Learning Overview of Algorithms Overview of Tools and Processes Request More Information .


AI threatens yet more jobs – now, lab rats: Animal testing could be on the way out, thanks to machine learning

#artificialintelligence

Machine learning algorithms can help scientists predict chemical toxicity to a similar degree of accuracy as animal testing, according to a paper published this week in Toxicological Sciences. A whopping €3bn (over $3.5bn) is spent every year to study how the negative impacts of chemicals on animals like rats, rabbits or monkeys. The top nine most frequently tested safety experiments resulted in the death of the poor critters 57 per cent of the time in Europe in 2011. By using software, chemists may be able to spend less on animal testing and save more creatures. To demonstrate this, first, a team of researchers scoured through a range of databases to label 80,908 different chemicals.