AITopics

1909.1172

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

#artificialintelligenceSep-9-2019, 06:13:22 GMT

Exercise Classification with Machine Learning (Part I)

The first post will focus on a more algorithmic approach using k-Nearest Neighbors to classify an unknown video, and in the second post, we'll look at an exclusively machine learning (ML) approach. Code for everything we're going to cover can be found on this GitHub repository. The algorithmic approach (Part I) is written in Swift and is available as a CocoaPod. The ML approach (Part II) is written in Python/TensorFlow and can be found as part of the GitHub repository. We want to build a system which takes as input a video of a person performing an exercise and outputs a class label which describes the video.

artificial intelligence, machine learning, video, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.36)

Qiao, Xingye, Duan, Jiexin, Cheng, Guang

Rates of Convergence for Large-scale Nearest Neighbor Classification

arXiv.org Machine LearningSep-3-2019

Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor k should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter. Numerical studies have verified the theoretical findings.

artificial intelligence, machine learning, null, (18 more...)

1909.01464

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

#artificialintelligenceSep-2-2019, 23:41:35 GMT

Understanding Supervised Learning In One Article

As you might know, supervised machine learning is one of the most commonly used and successful types of machine learning. In this article, we will describe supervised learning in more detail and explain several popular supervised learning algorithms. Remember that supervised learning is used whenever we want to predict a certain outcome from a given input, and we have examples of input/output pairs. We build a machine learning model from these input/output pairs, which comprise our training set. Our goal is to make accurate predictions for new, never-before-seen data. Supervised learning often requires human effort to build the training set, but afterwards automates and often speeds up an otherwise laborious or infeasible task. There are two major types of supervised machine learning problems, called classification and regression. In classification, the goal is to predict a class label, which is a choice from a predefined list of possibilities.

dataset, inductive learning, oncology, (20 more...)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.30)

#artificialintelligenceAug-27-2019, 05:13:17 GMT

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Given a data set D containing millions of data points and a data consumer who is willing to pay for X to train a machine learning (ML) model over D, how should we distribute this X to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all N data points, it requires O(2 N) model evaluations for exact computation and O(N N) for (ϵ, δ)-approximation. In this paper, we focus on one popular family of ML models relying on K-nearest neighbors (KNN). The most surprising result is that for unweighted KNN classifiers and regressors, the Shapley value of all N data points can be computed, exactly, in O(N N) time -- an exponential improvement on computational complexity!

algorithm, artificial intelligence, machine learning, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.63)

#artificialintelligenceAug-27-2019, 05:13:11 GMT

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Given a data set D containing millions of data points and a data consumer who is willing to pay for X to train a machine learning (ML) model over D, how should we distribute this X to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all N data points, it requires O(2 N) model evaluations for exact computation and O(N N) for (ϵ, δ)-approximation. In this paper, we focus on one popular family of ML models relying on K-nearest neighbors (KNN). The most surprising result is that for unweighted KNN classifiers and regressors, the Shapley value of all N data points can be computed, exactly, in O(N N) time -- an exponential improvement on computational complexity!

algorithm, artificial intelligence, machine learning, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.63)

arXiv.org Machine LearningAug-22-2019

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Jia, Ruoxi, Dao, David, Wang, Boxin, Hubis, Frances Ann, Gurel, Nezihe Merve, Li, Bo, Zhang, Ce, Spanos, Costas J., Song, Dawn

Given a data set $\mathcal{D}$ containing millions of data points and a data consumer who is willing to pay for \$$X$ to train a machine learning (ML) model over $\mathcal{D}$, how should we distribute this \$$X$ to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all $N$ data points, it requires $O(2^N)$ model evaluations for exact computation and $O(N\log N)$ for $(\epsilon, \delta)$-approximation. In this paper, we focus on one popular family of ML models relying on $K$-nearest neighbors ($K$NN). The most surprising result is that for unweighted $K$NN classifiers and regressors, the Shapley value of all $N$ data points can be computed, exactly, in $O(N\log N)$ time -- an exponential improvement on computational complexity! Moreover, for $(\epsilon, \delta)$-approximation, we are able to develop an algorithm based on Locality Sensitive Hashing (LSH) with only sublinear complexity $O(N^{h(\epsilon,K)}\log N)$ when $\epsilon$ is not too small and $K$ is not too large. We empirically evaluate our algorithms on up to $10$ million data points and even our exact algorithm is up to three orders of magnitude faster than the baseline approximation algorithm. The LSH-based approximation algorithm can accelerate the value calculation process even further. We then extend our algorithms to other scenarios such as (1) weighed $K$NN classifiers, (2) different data points are clustered by different data curators, and (3) there are data analysts providing computation who also requires proper valuation.

algorithm, artificial intelligence, health & medicine, (20 more...)

1908.08619

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.92)
Leisure & Entertainment > Games (0.46)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Yesilli, Melih C., Khasawneh, Firas A., Otto, Andreas

Chatter Detection in Turning Using Machine Learning and Similarity Measures of Time Series via Dynamic Time Warping

arXiv.org Machine LearningAug-5-2019

Chatter detection from sensor signals has been an active field of research. While some success has been reported using several featurization tools and machine learning algorithms, existing methods have several drawbacks such as manual preprocessing and requiring a large data set. In this paper, we present an alternative approach for chatter detection based on K-Nearest Neighbor (kNN) algorithm for classification and the Dynamic Time Warping (DTW) as a time series similarity measure. The used time series are the acceleration signals acquired from the tool holder in a series of turning experiments. Our results, show that this approach achieves detection accuracies that in most cases outperform existing methods. We compare our results to the traditional methods based on Wavelet Packet Transform (WPT) and the Ensemble Empirical Mode Decomposition (EEMD), as well as to the more recent Topological Data Analysis (TDA) based approach. We show that in three out of four cutting configurations our DTW-based approach attains the highest average classification rate reaching in one case as high as 99% accuracy. Our approach does not require feature extraction, is capable of reusing a classifier across different cutting configurations, and it uses reasonably sized training sets. Although the resulting high accuracy in our approach is associated with high computational cost, this is specific to the DTW implementation that we used. Specifically, we highlight available, very fast DTW implementations that can even be implemented on small consumer electronics. Therefore, further code optimization and the significantly reduced computational effort during the implementation phase make our approach a viable option for in-process chatter detection.

artificial intelligence, machine learning, time series, (16 more...)

1908.01678

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.49)

#artificialintelligenceJul-23-2019, 04:56:13 GMT

K Nearest Neighbor Algorithm In Python - Towards Data Science

K-Nearest Neighbors, or KNN for short, is one of the simplest machine learning algorithms and is used in a wide array of institutions. KNN is a non-parametric, lazy learning algorithm. When we say a technique is non-parametric, it means that it does not make any assumptions about the underlying data. In other words, it makes its selection based off of the proximity to other data points regardless of what feature the numerical values represent. Being a lazy learning algorithm implies that there is little to no training phase.

health & medicine, k nearest neighbor algorithm, oncology, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Ortiz-Bejar, Jose, Tellez, Eric S., Graff, Mario

Feature space transformations and model selection to improve the performance of classifiers

arXiv.org Machine LearningJul-14-2019

Improving the performance of classifiers is the realm of prototype selection and kernel transformations. Prototype selection has been used to reduce the space complexity of k-Nearest Neighbors classifiers and to improve its accuracy, and kernel transformations enhanced the performance of linear classifiers by converting a non-linear separable problem into a linear one in the transformed space. Our proposal combines, in a model selection scheme, these transformations with classic algorithms such as Na\"ive Bayes and k-Nearest Neighbors to produce a competitive classifier. We analyzed our approach on different classification problems and compared it to state-of-the-art classifiers. The results show that the methodology proposed is competitive, obtaining the lowest rank among the classifiers being compared.